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0
00:00:02.135 --> 00:00:06.145
Welcome everyone to our ATAR notes, general Maths three,

1
00:00:06.145 --> 00:00:07.745
four unit, three Head Start lecture.

2
00:00:08.205 --> 00:00:10.145
Um, this is part of our January lecture series.

3
00:00:10.245 --> 00:00:11.385
My name's Josh. Um,

4
00:00:11.405 --> 00:00:13.625
and for the next two hours we're gonna be going

5
00:00:13.625 --> 00:00:15.585
through a fair bit of Unit three Head Start.

6
00:00:16.595 --> 00:00:18.565
I'll go through, go through the specifics of that,

7
00:00:18.985 --> 00:00:23.845
but most of what we will cover will be, uh, area Study one.

8
00:00:23.905 --> 00:00:26.365
We won't cover any of finance, so area Study two.

9
00:00:26.785 --> 00:00:28.805
And we won't cover all of Area Study one just

10
00:00:28.805 --> 00:00:32.685
because area study one is so big it makes up about double

11
00:00:33.065 --> 00:00:35.965
or makes up double what finance does in unit three,

12
00:00:36.145 --> 00:00:39.085
but essentially when it makes up is about 66% of it.

13
00:00:39.145 --> 00:00:40.885
So we're gonna go through about 50%

14
00:00:40.885 --> 00:00:42.525
of unit three if we think of it like that.

15
00:00:44.395 --> 00:00:48.455
Um, just a couple of things to start off.

16
00:00:48.715 --> 00:00:50.215
Um, I'd just like to welcome you all

17
00:00:50.215 --> 00:00:51.735
to our ATAR notes lecture series.

18
00:00:51.735 --> 00:00:53.495
So, ATAR Notes has been around for a very long time,

19
00:00:53.625 --> 00:00:54.695
since about 2007.

20
00:00:55.395 --> 00:00:58.415
Um, and they provide free resources, um, as well

21
00:00:58.415 --> 00:01:03.175
as some new paid resources, um, to students who go

22
00:01:03.175 --> 00:01:04.855
through V-C-E-H-S-C.

23
00:01:05.045 --> 00:01:08.215
They even now, um, operate in WA and Queensland.

24
00:01:08.355 --> 00:01:10.055
So it's really expanding from

25
00:01:10.055 --> 00:01:12.295
what it was really just doing VCE about, you know,

26
00:01:12.395 --> 00:01:13.455
10, 15 years ago.

27
00:01:14.635 --> 00:01:18.495
Um, so we offer all of these services, um, such as,

28
00:01:18.495 --> 00:01:22.495
as you can see here, uh, student notes, lectures, uh,

29
00:01:22.505 --> 00:01:25.455
discussion videos and all of those sort of things.

30
00:01:25.455 --> 00:01:27.615
And now just to also point out, if you notice

31
00:01:27.615 --> 00:01:30.015
that I'm looking to my right, left,

32
00:01:30.215 --> 00:01:31.735
whatever it is on your screen, um, that's

33
00:01:31.735 --> 00:01:34.095
because I have my slides up on a monitor,

34
00:01:34.475 --> 00:01:36.455
so I'm not looking away from you.

35
00:01:36.755 --> 00:01:39.335
My camera is here, my monitor is there. That is why.

36
00:01:40.475 --> 00:01:42.255
Um, but also what's really cool is we've

37
00:01:42.255 --> 00:01:43.455
started this new program.

38
00:01:43.635 --> 00:01:45.415
So we used to have a thing called Ed Unlimited,

39
00:01:45.415 --> 00:01:46.415
you've probably heard of that before.

40
00:01:46.995 --> 00:01:49.175
Um, and we also have too smartt,

41
00:01:49.175 --> 00:01:50.455
which is our paid tutoring program,

42
00:01:50.455 --> 00:01:52.095
but we now have a Notes Plus,

43
00:01:52.095 --> 00:01:55.215
which is essentially a subscription based program in which

44
00:01:55.875 --> 00:02:00.235
we cover, uh, lots of new content

45
00:02:00.695 --> 00:02:02.595
and we have all our ATAR notes books,

46
00:02:02.595 --> 00:02:04.195
which are still available as hard copies.

47
00:02:04.255 --> 00:02:05.755
So we have our books, we have our topic tests

48
00:02:05.755 --> 00:02:08.555
and our course notes, um, all available on online platform

49
00:02:08.615 --> 00:02:10.395
as well as now an AI chat bot,

50
00:02:10.525 --> 00:02:11.875
which is really, really useful.

51
00:02:11.945 --> 00:02:13.995
It's been, um, it's learned from all

52
00:02:13.995 --> 00:02:15.035
of our eight hundreds books

53
00:02:15.035 --> 00:02:18.075
and it can answer really complex questions in terms of VCE

54
00:02:18.255 --> 00:02:19.595
and we also now have flashcards.

55
00:02:19.695 --> 00:02:22.315
Um, and there's also some questions and so forth on there

56
00:02:22.695 --> 00:02:25.275
and, uh, other resources such as exams and so forth.

57
00:02:26.415 --> 00:02:29.115
Um, so moving forward, just some general housekeeping.

58
00:02:29.495 --> 00:02:31.435
Um, please utilize the chat for those of you

59
00:02:31.435 --> 00:02:33.315
who are here at the sort of live premier.

60
00:02:33.975 --> 00:02:36.195
Um, please utilize the chat.

61
00:02:36.335 --> 00:02:38.595
Um, I will be there for the whole two hours, um,

62
00:02:38.695 --> 00:02:40.155
and I'll be able to answer any

63
00:02:40.155 --> 00:02:41.395
of your questions during that time.

64
00:02:41.645 --> 00:02:43.395
After that, the chat will still be available

65
00:02:43.455 --> 00:02:44.675
and this recording will be available

66
00:02:44.855 --> 00:02:46.355
for I believe about a week or two weeks.

67
00:02:46.935 --> 00:02:50.235
Um, and then what it will do is it'll go onto a Notes Plus.

68
00:02:50.735 --> 00:02:53.395
Um, in that time, please free to ask any questions.

69
00:02:53.395 --> 00:02:55.635
The chat will still be available afterwards just to look at,

70
00:02:55.635 --> 00:02:57.075
you won't be able to ask anything.

71
00:02:57.255 --> 00:02:58.835
Um, I won't be available to answer anything,

72
00:02:58.855 --> 00:03:00.085
but you can look through and see

73
00:03:00.085 --> 00:03:01.125
if your question's already been asked.

74
00:03:01.785 --> 00:03:03.005
Um, and as it says there,

75
00:03:03.005 --> 00:03:04.285
the lecture be available afterwards.

76
00:03:05.385 --> 00:03:07.525
Um, so what we're gonna cover today, um, so

77
00:03:07.525 --> 00:03:09.685
what we're gonna cover today is we're gonna cover just some

78
00:03:09.885 --> 00:03:11.645
overview and study tips because it's really important

79
00:03:11.795 --> 00:03:13.965
with general, I will call it further at times.

80
00:03:14.045 --> 00:03:15.565
I do apologize. When I went through this topic,

81
00:03:15.585 --> 00:03:17.485
it was called further, it's now called General.

82
00:03:18.145 --> 00:03:20.765
Um, there are a couple of things that are really important

83
00:03:20.985 --> 00:03:23.005
for how, uh, sort of

84
00:03:23.665 --> 00:03:25.365
you should go about studying for this subject.

85
00:03:25.425 --> 00:03:27.725
It is slightly different to other subjects such as,

86
00:03:27.725 --> 00:03:30.365
you know, your, your math methods or your English

87
00:03:30.625 --> 00:03:34.325
or your sciences or um, your human subjects.

88
00:03:34.325 --> 00:03:36.885
It is slightly different in terms of how I would approach it

89
00:03:37.025 --> 00:03:38.165
and I'll sort of discuss that there.

90
00:03:38.315 --> 00:03:40.565
They're gonna go through univa and buy variate data.

91
00:03:40.865 --> 00:03:42.605
I'm gonna go through a little bit of the application.

92
00:03:42.605 --> 00:03:44.485
We're not gonna go into sort of our seasonal,

93
00:03:44.865 --> 00:03:45.925
if anyone knows what that is.

94
00:03:46.025 --> 00:03:47.725
If you don't, don't worry. You'll cover

95
00:03:47.725 --> 00:03:48.725
that later on in the topic.

96
00:03:48.985 --> 00:03:50.605
Uh, but we won't go through that side of things.

97
00:03:51.665 --> 00:03:52.885
And then a little bit about who I am.

98
00:03:53.425 --> 00:03:54.645
Um, so obviously my name is Josh.

99
00:03:54.905 --> 00:03:56.445
Um, I've now been tutoring for,

100
00:03:56.445 --> 00:03:58.285
this is going into my sixth year of tutoring,

101
00:03:58.505 --> 00:03:59.605
um, which is a long time.

102
00:03:59.945 --> 00:04:02.125
Um, it'll probably be my last, um,

103
00:04:02.185 --> 00:04:03.645
I'm in my final year of medical school.

104
00:04:03.825 --> 00:04:06.965
Um, so I've done four years of medical school plus a year

105
00:04:06.965 --> 00:04:08.725
of honors, and now I'm going into my final

106
00:04:08.745 --> 00:04:12.205
and fifth year of medical school, um, at Monash University.

107
00:04:12.505 --> 00:04:14.765
Um, I have two very adorable pets at home.

108
00:04:15.025 --> 00:04:16.445
Um, I'm not originally from Melbourne.

109
00:04:16.625 --> 00:04:17.845
Um, I have a dog called May

110
00:04:17.905 --> 00:04:19.525
and I have a very chubby cat called Bella.

111
00:04:19.945 --> 00:04:23.765
Um, and they are the things that motivate me to get

112
00:04:23.765 --> 00:04:25.325
through everything every day.

113
00:04:25.705 --> 00:04:27.845
So let's go through some course overview and some tips.

114
00:04:28.105 --> 00:04:30.925
So just some structure of general mats.

115
00:04:30.925 --> 00:04:32.325
I think what's really important is that

116
00:04:32.765 --> 00:04:35.085
although this is now going into your second year

117
00:04:35.085 --> 00:04:38.765
of general maths, um, in the new study design, most

118
00:04:38.765 --> 00:04:41.365
of you are, I'm assuming a fair proportion

119
00:04:41.365 --> 00:04:42.805
of you probably have either older siblings

120
00:04:42.945 --> 00:04:44.085
or an older family friend

121
00:04:44.085 --> 00:04:46.325
or just an older friend who's given you some resources

122
00:04:46.325 --> 00:04:48.605
or your teacher's given out some older resources

123
00:04:48.605 --> 00:04:49.645
from students in the past.

124
00:04:50.775 --> 00:04:52.725
There are subtle changes to this study design

125
00:04:52.785 --> 00:04:56.365
and they have changed in the last 24 months, essentially.

126
00:04:56.385 --> 00:04:57.805
So last year was the first year.

127
00:04:58.035 --> 00:04:59.245
This year is our second year.

128
00:05:00.305 --> 00:05:02.365
Um, essentially what changed was we used

129
00:05:02.365 --> 00:05:03.405
to have these four modules.

130
00:05:03.865 --> 00:05:07.485
Now what happens is that we only have two modules.

131
00:05:07.665 --> 00:05:09.885
So you didn't do four modules.

132
00:05:10.035 --> 00:05:11.605
Your school chose two of them.

133
00:05:12.065 --> 00:05:15.925
Now, essentially VC or VAR is choosing two for you.

134
00:05:15.945 --> 00:05:19.125
You don't get a choice. So you have four topics overall.

135
00:05:19.705 --> 00:05:21.725
Now you have data analysis, which is the first one.

136
00:05:21.785 --> 00:05:23.565
It is worth double all of the other ones.

137
00:05:23.865 --> 00:05:27.525
So it is worth, think of it as worth 40% of the content.

138
00:05:27.915 --> 00:05:30.245
Content. Each of the others is worth 20% of the content.

139
00:05:30.825 --> 00:05:32.365
All of that makes up to you a hundred percent.

140
00:05:33.225 --> 00:05:34.685
So you've got data analysis first,

141
00:05:34.685 --> 00:05:35.725
which is easily your biggest.

142
00:05:36.185 --> 00:05:39.165
You've got finance, which is arguably the most difficult.

143
00:05:39.385 --> 00:05:41.925
Um, and it'll be covered in the autumn lecture series.

144
00:05:42.635 --> 00:05:44.565
Then you've got matrices and you've got networks.

145
00:05:44.945 --> 00:05:47.805
Um, and these are really important that you understand

146
00:05:47.805 --> 00:05:50.205
that these are the four topics that you need to know.

147
00:05:50.385 --> 00:05:53.405
Now, also, as an extra point to that is that

148
00:05:53.975 --> 00:05:55.205
there are subtle changes.

149
00:05:55.355 --> 00:05:58.525
They're very subtle subjects did not change drastically,

150
00:05:58.525 --> 00:06:01.125
but there are subtle changes to each subject.

151
00:06:01.125 --> 00:06:02.125
There are some things that were

152
00:06:02.125 --> 00:06:03.365
removed, some things that were added.

153
00:06:03.785 --> 00:06:07.045
So really important that you, if you are using older notes,

154
00:06:07.465 --> 00:06:10.205
you are prepared to understand that some

155
00:06:10.205 --> 00:06:12.205
of the stuff you will look at will be relevant.

156
00:06:12.205 --> 00:06:13.845
Some of the stuff you look at will not be relevant.

157
00:06:15.525 --> 00:06:17.145
Um, and then you have two exams for this.

158
00:06:17.285 --> 00:06:18.945
So really important, there are two exams.

159
00:06:19.165 --> 00:06:20.905
The two exams consist of a multiple choice

160
00:06:20.905 --> 00:06:22.105
exam and a short answer exam.

161
00:06:22.105 --> 00:06:23.385
Both go for an hour and a half.

162
00:06:23.965 --> 00:06:25.785
Um, the multiple choice is worth 40

163
00:06:25.785 --> 00:06:26.825
marks, 40 multiple choice.

164
00:06:26.955 --> 00:06:28.425
Short answer is worth 60 marks.

165
00:06:28.425 --> 00:06:30.465
There is, uh, x amount of questions.

166
00:06:30.885 --> 00:06:34.585
Um, it can vary, but it's worth 60 marks in total.

167
00:06:34.845 --> 00:06:37.025
Um, so you make up to a hundred marks in total.

168
00:06:39.325 --> 00:06:43.545
Um, so also really important the study design,

169
00:06:43.545 --> 00:06:44.745
it changed in 2023.

170
00:06:44.925 --> 00:06:47.145
So really important that you are on top of that

171
00:06:47.165 --> 00:06:49.945
and that you read through it and that you do utilize this.

172
00:06:50.605 --> 00:06:52.745
Um, you study design, it's really important

173
00:06:52.825 --> 00:06:54.865
that you have this type design somewhere on your computer,

174
00:06:55.285 --> 00:06:57.145
um, so that you can access it

175
00:06:57.145 --> 00:06:58.665
and you can read through it when you need to

176
00:06:58.685 --> 00:06:59.745
and when you need to decipher

177
00:06:59.745 --> 00:07:00.865
what is relevant and what is not.

178
00:07:02.815 --> 00:07:05.615
Um, so couple of things.

179
00:07:05.615 --> 00:07:07.255
What changed last year, just to, just

180
00:07:07.255 --> 00:07:08.535
to solidify what has changed.

181
00:07:08.875 --> 00:07:10.135
So there was no more modules

182
00:07:10.135 --> 00:07:11.255
choice, which you've already discussed.

183
00:07:11.625 --> 00:07:14.255
There was removal of non causation effect in

184
00:07:14.255 --> 00:07:15.295
population statistics.

185
00:07:15.295 --> 00:07:17.575
So if you think see things about population statistics

186
00:07:17.635 --> 00:07:19.175
and you're a bit confused what's going on,

187
00:07:19.545 --> 00:07:20.895
don't worry that's being removed.

188
00:07:21.475 --> 00:07:24.415
Um, there was removal of simultaneous equations in matrices.

189
00:07:24.415 --> 00:07:25.535
Again, we're not gonna cover that today,

190
00:07:25.555 --> 00:07:27.615
but that was a massive part of matrices.

191
00:07:27.615 --> 00:07:29.375
So you'll find whenever you do older exams

192
00:07:29.375 --> 00:07:33.375
and you do matrices that there is a, there's a fair amount

193
00:07:33.375 --> 00:07:35.255
of effort put into simultaneous equations

194
00:07:35.595 --> 00:07:37.255
and representation of linear lines

195
00:07:37.315 --> 00:07:40.215
and you'll be like, oh, I have no idea what's going on.

196
00:07:40.945 --> 00:07:42.215
Don't worry, it's being removed.

197
00:07:42.235 --> 00:07:44.135
But it was super high yield content in the past.

198
00:07:44.275 --> 00:07:46.215
So it will commonly come up in

199
00:07:46.235 --> 00:07:47.615
old practice exams that you do.

200
00:07:49.185 --> 00:07:51.175
There was the addition of the Leslie Matrix, so this sort

201
00:07:51.175 --> 00:07:53.255
of made up for what they removed from matrices, um,

202
00:07:53.255 --> 00:07:56.455
the Leslie Matrix, um, it'll be covered in a later lecture,

203
00:07:56.675 --> 00:07:58.695
um, but you won't find practice questions on

204
00:07:58.695 --> 00:07:59.815
it past last year.

205
00:08:00.755 --> 00:08:02.215
Um, and then there was a name change.

206
00:08:02.315 --> 00:08:04.375
Now there was a couple of little things added into data,

207
00:08:04.515 --> 00:08:07.895
but there were only very subtle or just changing of wording.

208
00:08:07.955 --> 00:08:09.655
And so therefore I haven't really put in this slide.

209
00:08:09.705 --> 00:08:11.575
We'll cover it today, so don't worry about that.

210
00:08:12.515 --> 00:08:14.335
Um, just a couple of other little tips.

211
00:08:14.765 --> 00:08:16.895
Your calculator, really, really important.

212
00:08:17.195 --> 00:08:20.855
Please get used to using the correct calculator.

213
00:08:20.855 --> 00:08:24.975
Please do not use the Cassio or the TN Inspire.

214
00:08:25.035 --> 00:08:28.855
So your white, your black tans fire in, no, sorry,

215
00:08:28.855 --> 00:08:29.895
please do use those.

216
00:08:30.025 --> 00:08:31.615
Sorry, other way around. I'm thinking about science.

217
00:08:31.615 --> 00:08:33.135
Please do get used to using those.

218
00:08:33.355 --> 00:08:34.935
Um, you don't have access to your little one.

219
00:08:34.955 --> 00:08:37.335
So I know a lot of you, which is why I get confused.

220
00:08:37.575 --> 00:08:39.335
I do apologize for that. You'll find

221
00:08:39.335 --> 00:08:41.015
that sometimes in sciences, which is,

222
00:08:41.315 --> 00:08:42.655
um, I manage chemistry.

223
00:08:42.655 --> 00:08:43.775
That's why I was thinking that other way.

224
00:08:44.355 --> 00:08:45.935
You only use that little one

225
00:08:46.635 --> 00:08:47.935
and a lot of people get really used to it

226
00:08:47.955 --> 00:08:49.695
and that's just what they like to use.

227
00:08:49.835 --> 00:08:51.135
You don't get that in the exam.

228
00:08:51.195 --> 00:08:54.535
You only get your TN Inspire and you only get your ca

229
00:08:54.635 --> 00:08:56.605
or your cao, um, class pattern.

230
00:08:56.785 --> 00:08:58.285
Please get used to using it.

231
00:08:58.585 --> 00:09:01.565
Um, you get it in both exams, you get it in all sacs.

232
00:09:01.595 --> 00:09:04.605
It's really important that you understand how to utilize it

233
00:09:04.605 --> 00:09:07.735
because it's gonna be your savior at the end of the day.

234
00:09:08.075 --> 00:09:09.975
Um, you may not use it for every question,

235
00:09:10.375 --> 00:09:11.695
I hope you don't use it for every question,

236
00:09:12.155 --> 00:09:14.615
but there will be questions where it will be far more useful

237
00:09:14.635 --> 00:09:15.815
and it'll save lots of time.

238
00:09:15.995 --> 00:09:17.735
It will be your best friend.

239
00:09:18.555 --> 00:09:22.375
Um, so always

240
00:09:23.165 --> 00:09:26.335
have some shortcuts, um, on your menu screen if you can.

241
00:09:26.395 --> 00:09:28.615
If you can't, make sure you've got some little points in

242
00:09:28.615 --> 00:09:32.215
your, um, in your summary book

243
00:09:32.635 --> 00:09:36.335
or your bound reference, whichever one you wanna call, um,

244
00:09:36.915 --> 00:09:39.455
please have some little menu sort of shortcuts in there

245
00:09:39.455 --> 00:09:41.775
because if you sort of get into the exam and you stumble

246
00:09:41.915 --> 00:09:43.215
and you might have a mind blank,

247
00:09:43.355 --> 00:09:44.895
you just wanna have those shortcuts in there so

248
00:09:44.895 --> 00:09:46.175
that you know where you're going and what to do.

249
00:09:46.195 --> 00:09:47.295
It is your best friend.

250
00:09:47.835 --> 00:09:51.935
Um, also from there, um,

251
00:09:53.065 --> 00:09:55.465
I sort of preface this with a little bit

252
00:09:55.465 --> 00:09:56.825
of a, you know, asterisk.

253
00:09:58.085 --> 00:10:00.125
I do say complete all the textbooks exercises.

254
00:10:00.465 --> 00:10:03.885
Um, even if your teacher says, you know, leave some behind.

255
00:10:04.805 --> 00:10:06.805
I preface this with a little asterisk, do not do this

256
00:10:06.825 --> 00:10:08.565
as practice for an exam.

257
00:10:09.235 --> 00:10:10.845
This is the stuff that you do

258
00:10:11.435 --> 00:10:13.525
when you are learning the content.

259
00:10:13.705 --> 00:10:14.885
So as you go through,

260
00:10:14.885 --> 00:10:16.765
you're more than likely gonna get questions from your

261
00:10:16.765 --> 00:10:18.445
teacher or you're gonna get some textbook questions

262
00:10:18.665 --> 00:10:19.885
and they'll say, you know, do these

263
00:10:19.885 --> 00:10:21.245
ones, that's your homework.

264
00:10:21.525 --> 00:10:22.645
I need to say that to tick it off

265
00:10:22.645 --> 00:10:25.205
and say that you're going Well, I would expect you to try

266
00:10:25.205 --> 00:10:27.845
and do all of it on the basis that when you are learning it,

267
00:10:28.065 --> 00:10:29.285
the more you do, the better.

268
00:10:29.795 --> 00:10:32.645
When you are doing practice for the exam, so at the end

269
00:10:32.645 --> 00:10:34.125
of the year and you're going for your exam

270
00:10:34.125 --> 00:10:36.125
and you're pushing through, please, please,

271
00:10:36.125 --> 00:10:39.445
please do not use, uh, textbook questions.

272
00:10:39.665 --> 00:10:41.845
Why is that? Textbook questions are

273
00:10:42.275 --> 00:10:43.445
useless at the end of the year.

274
00:10:43.505 --> 00:10:44.685
So I'm gonna get my little laser pointed

275
00:10:44.685 --> 00:10:45.765
out because I think it's gonna be more useful.

276
00:10:46.265 --> 00:10:49.045
Um, textbook questions at the end of the year are useless.

277
00:10:49.305 --> 00:10:51.125
Why is that? Because textbook questions are

278
00:10:51.125 --> 00:10:52.525
broken into sections.

279
00:10:52.745 --> 00:10:56.405
So they're in sections, they are, it's a section on, um,

280
00:10:56.965 --> 00:11:01.045
univa data dot plots or bi-variate data.

281
00:11:01.585 --> 00:11:04.365
Um, using your manipulation circle,

282
00:11:05.265 --> 00:11:07.165
um, to linearize data.

283
00:11:07.545 --> 00:11:09.285
You know what the topic is, you don't have

284
00:11:09.285 --> 00:11:11.325
to read a question and then decipher what I need to do.

285
00:11:11.835 --> 00:11:13.205
That is a really important skill

286
00:11:13.225 --> 00:11:16.005
and that is one of the hardest things in the exam is reading

287
00:11:16.165 --> 00:11:17.205
a a long question

288
00:11:17.265 --> 00:11:19.365
and deciphering what I actually have to do.

289
00:11:19.785 --> 00:11:20.885
You don't do that in a textbook.

290
00:11:21.025 --> 00:11:24.715
The textbook sections tell you what to do as such.

291
00:11:25.215 --> 00:11:27.955
Please do not do textbook questions as exam study,

292
00:11:28.335 --> 00:11:31.195
but whilst you're learning a topic, it's really useful

293
00:11:31.195 --> 00:11:33.995
because they sort of hone you in on every little variation

294
00:11:33.995 --> 00:11:35.115
that the questions can have.

295
00:11:35.575 --> 00:11:36.995
And then when you go

296
00:11:36.995 --> 00:11:39.075
and do practice sacs, you then sort of have

297
00:11:39.075 --> 00:11:41.315
to decipher when I need to use those little skills.

298
00:11:42.615 --> 00:11:44.835
Um, so great for that first grasping.

299
00:11:45.215 --> 00:11:47.715
Um, I always think as well that you need to review sections.

300
00:11:47.775 --> 00:11:51.315
So especially with data, data is worth so much data, data,

301
00:11:51.315 --> 00:11:52.795
whichever one you wanna say, it's

302
00:11:52.795 --> 00:11:53.795
Worth so much. And you

303
00:11:53.795 --> 00:11:54.605
do it at the start of the year

304
00:11:54.665 --> 00:11:56.285
and then a lot of students just forget about it.

305
00:11:56.345 --> 00:11:57.805
So they move forward and they go, all right,

306
00:11:57.955 --> 00:11:59.085
I'll come back to that at the end of the year.

307
00:11:59.145 --> 00:12:01.645
And then they forget it all. Always review sections,

308
00:12:01.645 --> 00:12:03.285
especially if there's an area in data

309
00:12:03.285 --> 00:12:05.645
that you're really weak at and maybe you do the sac

310
00:12:05.645 --> 00:12:06.965
and you struggle with it and you get the question wrong,

311
00:12:07.785 --> 00:12:10.525
go back and review that section after the sac, go back

312
00:12:10.525 --> 00:12:12.365
and review it a couple of weeks after the sack go

313
00:12:12.365 --> 00:12:13.925
and review it a couple of months after the sap.

314
00:12:14.185 --> 00:12:15.525
Always keep on top of it.

315
00:12:15.705 --> 00:12:17.605
So always keep refreshing yourself on it

316
00:12:18.185 --> 00:12:20.045
and don't be scared to cut things outta your textbooks.

317
00:12:20.045 --> 00:12:22.085
You don't need to sell at the end of the year, you know, oh,

318
00:12:22.085 --> 00:12:24.045
whoopy, doda, I'm gonna miss out on my, my 20

319
00:12:24.045 --> 00:12:26.085
or $30 from selling my textbook at the end of the year.

320
00:12:26.585 --> 00:12:29.325
Not a big deal. At the end of the day, if you want

321
00:12:29.325 --> 00:12:31.725
to cut things outta your textbook, feel free to do it.

322
00:12:31.785 --> 00:12:34.245
If not, you can also, you know, photocopy, you're welcome

323
00:12:34.265 --> 00:12:36.565
to do that, but it's a really useful thing to do

324
00:12:37.145 --> 00:12:38.205
and to put into your,

325
00:12:38.795 --> 00:12:40.445
your summary book or your bound reference.

326
00:12:41.185 --> 00:12:43.085
Um, you can also use sort of external ones as well.

327
00:12:43.325 --> 00:12:44.805
I do obviously suggest the ATAR notes.

328
00:12:44.805 --> 00:12:46.205
They're really, really good and they're,

329
00:12:46.555 --> 00:12:48.005
they're produced by students.

330
00:12:48.345 --> 00:12:50.365
So the questions are aimed at you.

331
00:12:50.365 --> 00:12:52.085
They know how they ask the questions,

332
00:12:52.545 --> 00:12:54.685
so they ask you the questions in that sort of way.

333
00:12:54.685 --> 00:12:56.245
Sometimes textbooks don't do that

334
00:12:56.245 --> 00:12:58.685
and that's why I suggest not using them for exam study.

335
00:13:00.455 --> 00:13:01.515
Um, and then summary book.

336
00:13:01.515 --> 00:13:02.635
Look, there are two sort

337
00:13:02.635 --> 00:13:05.235
of main ways you can do a study, uh, a summary book.

338
00:13:05.575 --> 00:13:07.235
I'm not gonna go through it in super detail,

339
00:13:07.615 --> 00:13:09.955
but think of it as doing a really big summary book versus

340
00:13:09.955 --> 00:13:11.435
doing a really small summary book.

341
00:13:11.815 --> 00:13:15.035
Now why I say there's maybe three is that you can sort

342
00:13:15.035 --> 00:13:16.235
of do a big summary book with a

343
00:13:16.235 --> 00:13:17.355
little summary book at the end.

344
00:13:17.495 --> 00:13:19.555
And what that means is you do your main summary book

345
00:13:19.555 --> 00:13:21.675
throughout the year, and then as you get close to the end

346
00:13:21.675 --> 00:13:24.515
of the year, you sort of summarize each topic in two pages.

347
00:13:25.065 --> 00:13:26.155
Literally two pages.

348
00:13:26.335 --> 00:13:28.475
I'm not saying four and I'm saying either side of it, like,

349
00:13:28.475 --> 00:13:30.595
you know, four one, that side of the paper, that side

350
00:13:30.595 --> 00:13:31.595
and then that side and that side.

351
00:13:31.655 --> 00:13:35.275
No, I'm saying a foldout, just those two pages.

352
00:13:35.905 --> 00:13:38.715
That is all you are gonna summarize the entire topic in.

353
00:13:38.775 --> 00:13:41.235
So you are gonna have four foldouts, you're gonna have one

354
00:13:41.455 --> 00:13:43.715
for data, you have the next one, you know, finance,

355
00:13:43.715 --> 00:13:44.795
you're gonna have the next one.

356
00:13:44.795 --> 00:13:46.155
You're gonna have, um,

357
00:13:46.275 --> 00:13:47.435
matrices, you're gonna have the next one.

358
00:13:47.435 --> 00:13:48.675
You have networks. Now you might

359
00:13:48.675 --> 00:13:49.995
say the data's worth double.

360
00:13:50.145 --> 00:13:52.515
Well, I actually think as much as it's worth double,

361
00:13:52.615 --> 00:13:54.595
you can actually summarize data pretty well.

362
00:13:54.865 --> 00:13:58.795
There's a lot of jargon in data that you can leave

363
00:13:58.795 --> 00:14:00.235
to your big summary book, but

364
00:14:00.235 --> 00:14:01.395
nonetheless, there are two ways of doing it.

365
00:14:01.395 --> 00:14:02.795
You take a really, really big one.

366
00:14:03.245 --> 00:14:05.395
These can be a bit of a pain to look through in a sac

367
00:14:05.395 --> 00:14:06.915
or exam, but they do have everything.

368
00:14:06.915 --> 00:14:08.475
And if you're someone who really needs

369
00:14:08.475 --> 00:14:11.595
that extra support in your sacs in your exams, feel free

370
00:14:11.595 --> 00:14:12.795
to make it as big as you want.

371
00:14:13.615 --> 00:14:16.755
Now, secondly, there's also the far smaller one.

372
00:14:16.755 --> 00:14:20.555
It's a couple of pages. Um, first it, first of all it says,

373
00:14:20.555 --> 00:14:23.035
you know, you can print it off from a summary sheet.

374
00:14:23.095 --> 00:14:27.155
You know, try not to print something off.

375
00:14:27.155 --> 00:14:28.555
Please try and make one yourself.

376
00:14:28.935 --> 00:14:30.955
Yes, the positives are that it's a lot smaller

377
00:14:31.055 --> 00:14:33.195
and it's really quick and you can sort of go through it

378
00:14:33.195 --> 00:14:34.475
and you have some practice questions there.

379
00:14:35.095 --> 00:14:38.485
Um, it's also really important that to understand

380
00:14:38.485 --> 00:14:40.205
that it may not have everything if it's really small.

381
00:14:40.345 --> 00:14:41.925
And so therefore you may get to the exam

382
00:14:41.925 --> 00:14:46.155
and you may go, Ooh, not really too sure where to find that.

383
00:14:46.275 --> 00:14:47.235
I didn't put that in there because I

384
00:14:47.235 --> 00:14:48.315
keep it really summarized.

385
00:14:48.735 --> 00:14:50.515
So there are positives and negatives to it,

386
00:14:50.515 --> 00:14:52.125
but having a small one is not a bad thing.

387
00:14:52.125 --> 00:14:54.285
Sometimes people really like those smaller summaries

388
00:14:54.285 --> 00:14:55.845
and that's why you can sort of get the best

389
00:14:55.845 --> 00:14:57.645
of both worlds if you do that extra little bit.

390
00:14:58.735 --> 00:15:00.675
Um, but the golden rule is

391
00:15:00.675 --> 00:15:01.795
you need to create your own summary book.

392
00:15:01.815 --> 00:15:04.955
That's rule number one. You cannot get around that rule.

393
00:15:05.175 --> 00:15:06.275
You have to cover that rule.

394
00:15:07.695 --> 00:15:10.235
So that they're all my tips and tricks.

395
00:15:10.235 --> 00:15:12.475
My other sort of tip that I sort of left out in here, when

396
00:15:12.475 --> 00:15:16.175
to sort of start looking at exams, it's kind of cool with,

397
00:15:16.285 --> 00:15:17.775
with general that you can start looking

398
00:15:17.795 --> 00:15:19.055
at at exams pretty early.

399
00:15:19.155 --> 00:15:21.655
And what I mean by that is the exams are distinctly

400
00:15:21.675 --> 00:15:23.135
broken up into sections.

401
00:15:23.165 --> 00:15:24.855
It's not like a science exam

402
00:15:25.185 --> 00:15:27.455
where all the questions are mixed in together

403
00:15:27.595 --> 00:15:28.655
and topics overlap.

404
00:15:28.885 --> 00:15:32.295
It's not like, um, the methods exam where there's, you know,

405
00:15:32.385 --> 00:15:34.015
everything is sort of interlinked.

406
00:15:34.035 --> 00:15:35.055
You sort of learn one bit

407
00:15:35.055 --> 00:15:36.175
and then you build on it with the next

408
00:15:36.175 --> 00:15:37.335
topic and you build on the next topic.

409
00:15:37.355 --> 00:15:39.335
So essentially all the topics are just one topic.

410
00:15:39.835 --> 00:15:42.975
Um, it's not like your pe it's the same thing as sort

411
00:15:42.975 --> 00:15:44.655
of like science and your humanities.

412
00:15:44.805 --> 00:15:46.775
They're all very similar in terms of they sort

413
00:15:46.775 --> 00:15:50.525
of build on top of each other with general,

414
00:15:51.275 --> 00:15:53.925
they break it up into distinct sections.

415
00:15:53.925 --> 00:15:56.525
There is a distinct data section,

416
00:15:56.525 --> 00:15:58.325
there is a distinct finance section.

417
00:15:58.325 --> 00:16:00.045
There is a distinct matrices section,

418
00:16:00.045 --> 00:16:02.845
there is a distinct network section.

419
00:16:03.095 --> 00:16:06.085
These are broken up individually

420
00:16:06.085 --> 00:16:07.805
and they have their own little spots.

421
00:16:09.315 --> 00:16:13.815
As such, you can therefore start doing exams

422
00:16:13.815 --> 00:16:15.775
and just do the sections that are relevant

423
00:16:15.795 --> 00:16:17.015
as soon as you've done one topic.

424
00:16:17.275 --> 00:16:20.215
Now, I don't suggest that I suggest starting exams once

425
00:16:20.215 --> 00:16:21.215
you've finished unit three.

426
00:16:21.235 --> 00:16:22.775
And what that means is you've done finance

427
00:16:22.955 --> 00:16:27.015
and you've done, uh, you've done, done,

428
00:16:27.065 --> 00:16:28.135
sorry, you've done those two.

429
00:16:28.285 --> 00:16:31.015
Once you've done those two, you can start doing some exams,

430
00:16:31.035 --> 00:16:32.975
yes, don't do the newer VA ones.

431
00:16:33.175 --> 00:16:35.295
I would even suggest sort of steering clear VA,

432
00:16:35.505 --> 00:16:37.495
maybe do like a couple of really old ones,

433
00:16:38.555 --> 00:16:42.295
but beyond that you should really be stunning exams on.

434
00:16:42.295 --> 00:16:44.295
You've done unit three and you just do the half

435
00:16:44.295 --> 00:16:45.335
of the exam that's relevant.

436
00:16:45.355 --> 00:16:47.855
You just do data and you just do the finance.

437
00:16:48.075 --> 00:16:49.775
You don't do any of the matrices and stuff.

438
00:16:49.775 --> 00:16:51.375
You haven't done that yet. You haven't cut in class.

439
00:16:51.375 --> 00:16:53.895
There's no point. But then once you've done matrices

440
00:16:53.895 --> 00:16:54.935
or networks, networks,

441
00:16:54.935 --> 00:16:56.375
which then whichever one you do first,

442
00:16:57.425 --> 00:16:58.885
you then go, all right, I'm gonna add that in.

443
00:16:59.025 --> 00:17:00.405
And so now I'm gonna do that one next.

444
00:17:00.505 --> 00:17:04.855
And now you've done those, you've done, you're doing three.

445
00:17:04.855 --> 00:17:06.575
So you're doing, you know, three quarters

446
00:17:06.575 --> 00:17:07.575
of your exam a little bit more.

447
00:17:07.995 --> 00:17:09.735
And then once you've added the last one,

448
00:17:09.735 --> 00:17:11.455
you're already smashing through practice exams

449
00:17:11.455 --> 00:17:12.575
and you just add in networks

450
00:17:12.575 --> 00:17:14.015
or matrices, whichever one you did last.

451
00:17:15.755 --> 00:17:18.395
So let's jump in

452
00:17:18.655 --> 00:17:20.955
to area study one Data.

453
00:17:21.695 --> 00:17:23.435
So we're gonna start off with uni variant data.

454
00:17:23.555 --> 00:17:24.955
A lot of this is gonna feel like revision.

455
00:17:25.075 --> 00:17:26.515
A lot of this is gonna feel mundane.

456
00:17:27.455 --> 00:17:31.155
The reality is it's a lot of this content is quite mundane

457
00:17:31.255 --> 00:17:32.835
and you're gonna be sitting here like, oh,

458
00:17:33.465 --> 00:17:34.995
same thing over and over again.

459
00:17:35.265 --> 00:17:38.555
Reality is it is the summary book for this section,

460
00:17:38.975 --> 00:17:40.515
the big part of the summary book,

461
00:17:40.535 --> 00:17:42.555
not the little part is gonna be really large

462
00:17:42.975 --> 00:17:44.675
and then you can condense it down really well.

463
00:17:44.875 --> 00:17:46.275
'cause there's a lot of stuff here that's like,

464
00:17:46.915 --> 00:17:48.355
I don't really need it in my summary book,

465
00:17:48.375 --> 00:17:50.845
I'm gonna put it there just 'cause I want to have it there,

466
00:17:51.145 --> 00:17:53.645
but I don't really, in my little summary, I don't need it

467
00:17:53.645 --> 00:17:56.605
because it's stuff that you know I should know pretty well.

468
00:17:57.985 --> 00:18:01.295
So jumping in,

469
00:18:02.315 --> 00:18:04.815
we start off with, uh, data.

470
00:18:05.875 --> 00:18:07.615
So what is data?

471
00:18:07.615 --> 00:18:09.135
So data is all about facts

472
00:18:09.155 --> 00:18:12.175
and statistics, um, that are collected together, you know,

473
00:18:12.925 --> 00:18:14.265
in the reference or analysis.

474
00:18:14.925 --> 00:18:16.145
And then we have variables.

475
00:18:16.205 --> 00:18:18.505
So what we measure to collect data.

476
00:18:18.925 --> 00:18:20.545
So what variables do we look at?

477
00:18:20.545 --> 00:18:24.465
What do we do to collect data and how many variables?

478
00:18:24.725 --> 00:18:29.305
So we then look at uni variate one versus buy variant two.

479
00:18:29.525 --> 00:18:32.345
And this is where things get a little bit more complicated.

480
00:18:32.685 --> 00:18:36.985
So as you can see here, we then can discuss, well,

481
00:18:36.985 --> 00:18:38.745
if we have uni variant and we have buy variant,

482
00:18:38.935 --> 00:18:41.345
what if I have, you know, I have two pieces of data,

483
00:18:41.345 --> 00:18:42.465
but what if they're both different?

484
00:18:42.465 --> 00:18:43.785
Well then I can say, well, one of them might be

485
00:18:43.785 --> 00:18:45.585
categorical just names.

486
00:18:45.845 --> 00:18:49.785
And then one of them might be numerical, which is, you know,

487
00:18:50.095 --> 00:18:53.305
your values and your measurements.

488
00:18:54.685 --> 00:18:58.185
So from there, what else can we get?

489
00:18:58.775 --> 00:19:03.105
Well, we can get from here different subtypes.

490
00:19:03.165 --> 00:19:04.985
So we have categorical, we have numerical.

491
00:19:05.165 --> 00:19:07.945
Now we can break these down further,

492
00:19:08.245 --> 00:19:09.545
as I said, into our subtypes.

493
00:19:09.545 --> 00:19:10.585
So we'll start with categorical.

494
00:19:10.585 --> 00:19:12.665
So categorical is all about categories.

495
00:19:12.895 --> 00:19:14.225
Make sense called categorical.

496
00:19:14.375 --> 00:19:16.145
Categorical, it's called, it's got categories.

497
00:19:16.565 --> 00:19:17.625
Now the categories

498
00:19:17.625 --> 00:19:19.745
that we're gonna look at are we gonna look at things like

499
00:19:19.885 --> 00:19:21.825
eye color, football team, et cetera.

500
00:19:22.285 --> 00:19:24.385
Now what's really cool about those things is

501
00:19:24.965 --> 00:19:25.985
you can't really order them.

502
00:19:26.055 --> 00:19:28.265
Like sure, maybe like a football team, you can order it

503
00:19:28.265 --> 00:19:30.545
by the ladder, but you can't really order it.

504
00:19:30.545 --> 00:19:31.985
There's no way of ordering these things.

505
00:19:32.565 --> 00:19:34.865
But there are ways of ordering other categories.

506
00:19:35.805 --> 00:19:39.345
So order is our way of subcategorize categorical things.

507
00:19:39.525 --> 00:19:41.185
So we have nominal, which is,

508
00:19:41.185 --> 00:19:43.305
there's no sensible way to sort these categories.

509
00:19:43.405 --> 00:19:44.905
So if I'm doing eye color,

510
00:19:45.005 --> 00:19:47.105
it doesn't really matter the order of it.

511
00:19:47.225 --> 00:19:49.185
I can do brown eyes first. I can do blue eyes first.

512
00:19:49.305 --> 00:19:51.585
I can do green eyes first. I can do aqua eyes first.

513
00:19:52.065 --> 00:19:53.145
I can do hazel eyes first.

514
00:19:53.245 --> 00:19:56.545
It doesn't really matter which order I put these in.

515
00:19:57.725 --> 00:20:00.785
So I go, you know, um, in this case I do hair color,

516
00:20:00.855 --> 00:20:03.825
I've got, you know, blonde, brown, red, black, et cetera.

517
00:20:03.975 --> 00:20:06.105
There's, there's no way of ordering that.

518
00:20:06.105 --> 00:20:08.425
However, if I do order new categorical,

519
00:20:08.655 --> 00:20:09.665
this is a little bit different.

520
00:20:10.755 --> 00:20:14.035
I can order some categorical pieces of data such

521
00:20:14.035 --> 00:20:17.155
as very satisfied, somewhat satisfied, neutral,

522
00:20:17.675 --> 00:20:19.315
somewhat dissatisfied, very dissatisfied.

523
00:20:19.385 --> 00:20:22.395
That has a general order to it. Yes, you could start it.

524
00:20:22.395 --> 00:20:24.795
Dissatisfied or satisfied doesn't only matter which way you

525
00:20:24.795 --> 00:20:27.155
start, but it has a general order to it.

526
00:20:29.545 --> 00:20:32.125
Now numerical data on the other hand can be measured.

527
00:20:32.265 --> 00:20:34.165
So categorical data cannot be measured.

528
00:20:34.225 --> 00:20:37.205
It needs to be sort of counted as,

529
00:20:37.425 --> 00:20:40.005
it doesn't even need be counted, just needs to be obtained.

530
00:20:41.085 --> 00:20:43.405
Numerical data in a sense is measured or counted

531
00:20:44.585 --> 00:20:46.565
and we think of it in two distinct ways.

532
00:20:46.735 --> 00:20:48.485
There is discreet

533
00:20:49.105 --> 00:20:53.245
and then there is our continuous.

534
00:20:53.385 --> 00:20:55.085
Now what is the difference between the two?

535
00:20:55.595 --> 00:21:00.525
Well, discreet is more in the sense of something

536
00:21:00.525 --> 00:21:03.445
that we count and therefore it's usually whole numbers.

537
00:21:03.625 --> 00:21:06.925
Now, there are exceptions that allow some decimals in there,

538
00:21:07.385 --> 00:21:08.885
but it's more about counting.

539
00:21:09.285 --> 00:21:11.005
I can't say I have half a dog.

540
00:21:11.385 --> 00:21:14.765
If I'm gonna count the number of dogs people have

541
00:21:15.105 --> 00:21:16.645
or the number of pets people have,

542
00:21:17.045 --> 00:21:19.605
I can't say I have two and a half pets.

543
00:21:19.915 --> 00:21:21.045
That makes no sense.

544
00:21:21.705 --> 00:21:24.365
It needs to be two or it needs to be three.

545
00:21:24.535 --> 00:21:26.245
There is a distinct number.

546
00:21:27.295 --> 00:21:28.795
Now there are some exceptions to that.

547
00:21:29.055 --> 00:21:31.395
Things that are counted in point fives

548
00:21:31.495 --> 00:21:33.555
or maybe like, um, they've,

549
00:21:33.795 --> 00:21:36.075
a common example in the past has been an atar.

550
00:21:36.095 --> 00:21:40.635
So ATARs, uh, go up by 0.0 fives, that is a discreet

551
00:21:40.635 --> 00:21:43.075
because you can only be on a 0.05.

552
00:21:43.295 --> 00:21:47.435
It goes, you know, 0.05, 0.10,

553
00:21:47.625 --> 00:21:50.155
0.15, 0.2.

554
00:21:50.855 --> 00:21:52.075
That's how it goes up.

555
00:21:52.535 --> 00:21:57.155
Um, like 99.9 5, 99 0.9, 99.9, uh,

556
00:21:57.155 --> 00:21:58.715
8, 5, 9, 9 0.8.

557
00:21:58.785 --> 00:22:00.155
Like it counts by that.

558
00:22:00.625 --> 00:22:02.915
Usually though we say it needs to be a whole number.

559
00:22:03.725 --> 00:22:05.195
Continuous on the other hand is something

560
00:22:05.195 --> 00:22:07.355
that's more measured and it's something

561
00:22:07.355 --> 00:22:08.835
that usually they use.

562
00:22:08.835 --> 00:22:12.955
Weight, temperature, height, these things can go to

563
00:22:12.955 --> 00:22:14.195
as many decimal places as you want.

564
00:22:14.195 --> 00:22:15.475
Depends on how accurate you want to be.

565
00:22:15.695 --> 00:22:17.675
So if I weigh, you know, five people

566
00:22:17.935 --> 00:22:21.995
and I find they're, you know, 80.2, I find one

567
00:22:21.995 --> 00:22:26.675
of them's like a hundred 0.5, I find someone's like 89.32.

568
00:22:27.315 --> 00:22:29.675
I find someone that is like 92.67

569
00:22:29.735 --> 00:22:31.755
or I find someone that's like 50.25.

570
00:22:32.735 --> 00:22:34.435
You know, they, it's a measurement.

571
00:22:34.435 --> 00:22:35.435
They're different values.

572
00:22:35.435 --> 00:22:38.515
They can be anything they don't have to fit into, you know,

573
00:22:38.975 --> 00:22:41.435
1, 2, 3, they don't have to fit into that.

574
00:22:43.935 --> 00:22:45.475
So that is how we look at it.

575
00:22:45.475 --> 00:22:47.995
And this is a great way of sort of summarizing it.

576
00:22:47.995 --> 00:22:49.515
In your summary book, you have, you know,

577
00:22:49.515 --> 00:22:50.955
your categorical numerical

578
00:22:51.055 --> 00:22:53.115
and then you go, alright, well is there an order?

579
00:22:53.115 --> 00:22:54.955
Is there not an order? Do I count or do I measure?

580
00:22:55.335 --> 00:22:57.155
And that is how we look at these things.

581
00:22:58.575 --> 00:23:01.185
So another way of thinking about it

582
00:23:01.185 --> 00:23:03.105
as well is these questions that I'll pose to you.

583
00:23:03.405 --> 00:23:05.665
So my first question is, can it be manipulated?

584
00:23:05.885 --> 00:23:08.145
So can I manipulate this piece of data?

585
00:23:08.495 --> 00:23:11.105
Does it make sense to find the mean median mode

586
00:23:11.155 --> 00:23:12.705
range, et cetera?

587
00:23:13.045 --> 00:23:14.785
Yes. Or then it's numerical.

588
00:23:14.845 --> 00:23:17.985
If it's not, then it's, you know, it's categorical.

589
00:23:18.955 --> 00:23:20.975
It makes sense to subtract two heights from each other.

590
00:23:21.075 --> 00:23:22.935
It makes sense to find the median height,

591
00:23:23.075 --> 00:23:24.415
the mean height, et cetera.

592
00:23:24.555 --> 00:23:26.975
It doesn't make sense to find the mean, you know,

593
00:23:27.855 --> 00:23:29.895
football team, that, that doesn't make sense.

594
00:23:29.895 --> 00:23:31.655
There's no order to it, there's not even

595
00:23:32.295 --> 00:23:33.295
anything else going on with it.

596
00:23:33.475 --> 00:23:34.895
So you can't really do that.

597
00:23:37.225 --> 00:23:38.845
Um, can it be counted or measured?

598
00:23:39.025 --> 00:23:41.805
Yes, no, although numbers usually mean

599
00:23:41.805 --> 00:23:43.085
that a variable is numerical.

600
00:23:43.105 --> 00:23:45.365
It always doesn't. This is a really good point,

601
00:23:46.485 --> 00:23:49.645
a really common categorical ordinal.

602
00:23:49.985 --> 00:23:51.365
So it can have an order,

603
00:23:52.005 --> 00:23:54.045
a categorical ordinal piece of data.

604
00:23:54.555 --> 00:23:56.125
It's really common and comes up a lot.

605
00:23:57.075 --> 00:23:59.085
Postcodes, postcodes are a number.

606
00:23:59.585 --> 00:24:01.085
So you would think that's numerical.

607
00:24:01.235 --> 00:24:04.205
However, postcodes describe an area.

608
00:24:05.185 --> 00:24:09.325
So my postcode describes my area, so et cetera.

609
00:24:09.595 --> 00:24:14.365
Example 3000 is Melbourne, CB, D, 2000 is Sydney, c, b, d.

610
00:24:15.595 --> 00:24:19.445
They describe an area they are not describing a value.

611
00:24:19.545 --> 00:24:21.685
Yes, they can be ordered because you can start at one

612
00:24:21.685 --> 00:24:24.365
and you can go to however, whatever the largest one is.

613
00:24:25.065 --> 00:24:26.525
But they describe the area.

614
00:24:26.665 --> 00:24:29.725
And as such that postcode is describing a

615
00:24:29.725 --> 00:24:31.045
categorical piece of information.

616
00:24:31.195 --> 00:24:33.965
It's a categorical oral piece of information.

617
00:24:35.835 --> 00:24:36.935
So this is that example.

618
00:24:36.935 --> 00:24:40.935
There is the number being used as a name. So postcode.

619
00:24:41.235 --> 00:24:43.295
Um, another way of think of this is a house number.

620
00:24:43.295 --> 00:24:44.855
Another way of think of this is like a rating.

621
00:24:45.075 --> 00:24:47.415
So five stars. Um, a really common one

622
00:24:47.415 --> 00:24:48.655
as well is like shoe size.

623
00:24:49.115 --> 00:24:52.695
So shoe size that comes in numbers, um, starts at zero,

624
00:24:52.765 --> 00:24:56.615
goes up to ginormous sizes as it's like 20 or something.

625
00:24:57.325 --> 00:25:01.295
They are a number, but they are utilized to

626
00:25:01.815 --> 00:25:03.655
describe a size.

627
00:25:04.355 --> 00:25:06.095
So they're utilized to describe a size and

628
00:25:06.095 --> 00:25:08.495
therefore we would call that ordinal categorical.

629
00:25:10.275 --> 00:25:12.135
So here's a really quick question for you.

630
00:25:12.475 --> 00:25:14.295
Um, if you're playing along at home,

631
00:25:14.825 --> 00:25:18.015
pause your video now just quickly and go through

632
00:25:18.475 --> 00:25:19.775
and name each of these.

633
00:25:20.155 --> 00:25:21.215
So I'd go through the first one

634
00:25:21.215 --> 00:25:22.855
and say, all right, how often do you study?

635
00:25:22.905 --> 00:25:24.055
Often, sometimes rarely.

636
00:25:24.085 --> 00:25:26.855
Well, if I said, how much did I study in hours?

637
00:25:27.095 --> 00:25:29.295
I think that's sort of like a categor, I think

638
00:25:29.295 --> 00:25:31.335
that's like a numerical could be continuous

639
00:25:31.335 --> 00:25:32.615
that they're asking for specifics,

640
00:25:32.615 --> 00:25:34.415
but it could also be discreet if they're just asking

641
00:25:34.515 --> 00:25:37.575
to their hour, but they're actually asking it in terms

642
00:25:37.575 --> 00:25:38.895
of often, sometimes, rarely.

643
00:25:38.895 --> 00:25:40.655
Therefore, for this is categorical

644
00:25:40.755 --> 00:25:43.695
and that does have an order, something categorical,

645
00:25:43.965 --> 00:25:45.055
ordinal, et cetera.

646
00:25:45.635 --> 00:25:47.455
Go to the next one. Temperature and degrees Celsius.

647
00:25:47.455 --> 00:25:49.375
Well, this is a measurement, so I'm gonna say this is a

648
00:25:49.495 --> 00:25:51.615
numerical, continuous, et cetera.

649
00:25:51.725 --> 00:25:53.535
Move on. I want you to go through and do the rest of them.

650
00:25:53.555 --> 00:25:55.295
So pause 3, 2, 1,

651
00:25:56.905 --> 00:25:58.605
and hopefully you've unpaused and you're back.

652
00:25:59.025 --> 00:26:00.405
Um, hopefully you had a go at this one.

653
00:26:00.465 --> 00:26:02.125
And you've got the answers here.

654
00:26:04.125 --> 00:26:06.305
So another one here, this is from the 2016.

655
00:26:06.335 --> 00:26:08.825
Further exam one. So multiple choice exam.

656
00:26:09.045 --> 00:26:10.625
We have the variables of blood pressure

657
00:26:10.645 --> 00:26:13.705
and age are which of the following.

658
00:26:14.085 --> 00:26:16.145
So what's really cool about this question

659
00:26:16.165 --> 00:26:18.145
or what was not very cool about this question was only

660
00:26:18.145 --> 00:26:19.465
31% of people got this right.

661
00:26:20.045 --> 00:26:22.545
Um, so this is a more of a difficult question.

662
00:26:23.085 --> 00:26:28.035
So have a quick think. 3, 2, 1.

663
00:26:29.135 --> 00:26:32.795
All right. So this question here was a little bit more

664
00:26:32.795 --> 00:26:35.075
difficult in terms of that.

665
00:26:35.655 --> 00:26:39.115
Um, we find that there are two variables,

666
00:26:39.135 --> 00:26:43.805
and one of them is very much sort of a categorical ordinal

667
00:26:43.805 --> 00:26:45.845
because as you can see here, we've been told

668
00:26:45.845 --> 00:26:48.845
that low normal, high, it's clearly categorical,

669
00:26:48.955 --> 00:26:53.505
it's clearly describing, um, what's going on

670
00:26:54.005 --> 00:26:55.265
and a low, medium high.

671
00:26:55.485 --> 00:26:58.585
Now then we have o under 50 and over 50.

672
00:26:58.765 --> 00:27:00.105
Now what's really,

673
00:27:00.495 --> 00:27:02.185
what was really interesting about this question is a fair

674
00:27:02.345 --> 00:27:06.545
proportion of people thought this might have been a, um, a,

675
00:27:06.805 --> 00:27:10.495
uh, sorry, a numerical piece of data.

676
00:27:10.915 --> 00:27:12.695
So they thought this could have been a

677
00:27:12.695 --> 00:27:13.815
continuous piece of data.

678
00:27:14.435 --> 00:27:16.495
Um, and so a lot of people went for e

679
00:27:16.675 --> 00:27:18.175
and then a lot of people also went for D

680
00:27:18.175 --> 00:27:19.375
because we had an ordinal.

681
00:27:19.515 --> 00:27:20.815
So we clearly had an ordinal.

682
00:27:20.875 --> 00:27:23.015
So that's covered with our low normal high.

683
00:27:23.275 --> 00:27:25.535
But a lot of people said, oh, well under an over 50,

684
00:27:25.565 --> 00:27:26.655
it's just two categories.

685
00:27:26.715 --> 00:27:27.855
So we're just gonna say this is

686
00:27:27.855 --> 00:27:29.375
nominal, but it actually wasn't.

687
00:27:29.375 --> 00:27:31.135
It's also ordinal because you can order that.

688
00:27:31.135 --> 00:27:33.175
You can say, well, if you're under 50, that's at the start.

689
00:27:33.195 --> 00:27:35.455
If you're over 50, that's at the end, you can order that.

690
00:27:35.875 --> 00:27:37.655
So the answer to this was actually B.

691
00:27:38.115 --> 00:27:39.695
So both ordinal variables

692
00:27:39.695 --> 00:27:41.295
and only 31% of people got this right.

693
00:27:42.105 --> 00:27:44.965
That's 31% of the cohort got this question right,

694
00:27:45.255 --> 00:27:47.365
which was question two on the exam.

695
00:27:47.705 --> 00:27:49.325
So usually you will find,

696
00:27:49.355 --> 00:27:51.925
it's something I didn't talk about, the tips with the exams,

697
00:27:52.025 --> 00:27:53.765
the easier questions are at the start,

698
00:27:53.945 --> 00:27:57.885
the harder questions are at the end, um, of the groupings.

699
00:27:58.065 --> 00:28:00.765
So you'll have, you know, your 16 data questions

700
00:28:00.765 --> 00:28:02.085
and you have eight of all the others.

701
00:28:02.585 --> 00:28:06.405
Um, and usually the first two questions will be easier

702
00:28:06.585 --> 00:28:07.965
and usually the last two

703
00:28:07.965 --> 00:28:09.365
or three questions will be a lot harder.

704
00:28:09.385 --> 00:28:10.685
The middle questions usually vary,

705
00:28:11.145 --> 00:28:13.245
but it always tends, it just happens to be

706
00:28:13.245 --> 00:28:14.645
that the first two questions are a little bit more

707
00:28:14.845 --> 00:28:17.005
straightforward than the last two or three questions.

708
00:28:17.025 --> 00:28:18.405
It always happens to be that way.

709
00:28:18.915 --> 00:28:20.165
This is essentially one

710
00:28:20.165 --> 00:28:22.005
of their more straightforward questions on the exam.

711
00:28:22.345 --> 00:28:23.965
And 31% of people got this right.

712
00:28:24.025 --> 00:28:26.085
So if you are able to get your head around this concept

713
00:28:26.305 --> 00:28:29.285
and understand it, you're already gonna push yourself ahead

714
00:28:29.285 --> 00:28:31.085
of it at, at a minimum 50% of the cohort.

715
00:28:31.505 --> 00:28:32.885
Um, and that's what you really wanna be doing.

716
00:28:35.105 --> 00:28:36.685
So now that we've looked at that,

717
00:28:36.775 --> 00:28:38.765
let's move into uni variate data.

718
00:28:38.945 --> 00:28:43.605
So uni variant means one variable, so uni means one, um,

719
00:28:43.785 --> 00:28:45.085
and it can only,

720
00:28:45.625 --> 00:28:48.085
the only one thing changes or is manipulated.

721
00:28:48.425 --> 00:28:50.365
So as such you say, what's your favorite color?

722
00:28:50.545 --> 00:28:52.405
And then you just count the number of people.

723
00:28:52.705 --> 00:28:56.685
So this would be sort of a, um, a numerical discreet sort

724
00:28:56.685 --> 00:28:59.165
of piece of information, or actually it probably wouldn't be

725
00:28:59.165 --> 00:29:01.285
because you, you've got a category here,

726
00:29:01.505 --> 00:29:02.965
so you're probably looking at categorical,

727
00:29:03.065 --> 00:29:05.005
you're actually looking at categorical data here.

728
00:29:05.705 --> 00:29:08.505
Um, so what you can see,

729
00:29:08.535 --> 00:29:10.385
it's your variable is actually in the left column.

730
00:29:10.405 --> 00:29:12.225
And so you are looking at categorical data here.

731
00:29:12.925 --> 00:29:15.625
Um, and then you have biva data, which we'll cover in a bit.

732
00:29:15.685 --> 00:29:18.185
But essentially biva data is two variables.

733
00:29:18.385 --> 00:29:20.745
Biome two, um, there are two things that change

734
00:29:20.805 --> 00:29:22.065
or can be manipulated.

735
00:29:23.365 --> 00:29:25.465
Um, and these two things can change

736
00:29:25.565 --> 00:29:27.705
or be manipulated in different ways.

737
00:29:27.965 --> 00:29:29.985
You might actually have numerical versus just

738
00:29:30.085 --> 00:29:31.825
versus categorical data.

739
00:29:31.845 --> 00:29:32.705
You might have two pieces of

740
00:29:32.705 --> 00:29:34.065
categorical data like you've got here.

741
00:29:34.365 --> 00:29:36.825
So you've got a year level, so categorical ordinal

742
00:29:36.825 --> 00:29:39.425
because it can be ordered and you've got categorical, um,

743
00:29:41.325 --> 00:29:45.405
hierarchal, nominal, sorry, uh, which is your non, uh,

744
00:29:46.195 --> 00:29:49.325
your non sort of way of, uh, manipulating it.

745
00:29:49.665 --> 00:29:52.845
Um, because there your non sort of ordering way, sorry,

746
00:29:52.845 --> 00:29:55.885
because there's no way of ordering this piece of data here.

747
00:29:57.885 --> 00:30:00.985
So biver data is super interesting, more on this later.

748
00:30:01.645 --> 00:30:04.505
So let's start off with what type of graph do we use?

749
00:30:04.505 --> 00:30:06.025
So this is a really good little summary

750
00:30:06.245 --> 00:30:08.425
to go in the smaller side of the summary book.

751
00:30:08.725 --> 00:30:10.585
So a really good summary to say, all right,

752
00:30:10.585 --> 00:30:13.345
if I've got categorical numerical data, it's gonna go in one

753
00:30:13.345 --> 00:30:15.345
of these three types of graphs or tables.

754
00:30:15.725 --> 00:30:20.025
Now, if I actually have numerical data that is, um,

755
00:30:20.205 --> 00:30:24.225
as you can see here, one of those 1, 2, 3, 4, 5

756
00:30:25.265 --> 00:30:27.585
graphs or tables can be utilized

757
00:30:28.205 --> 00:30:30.865
to display numerical uni area data.

758
00:30:32.175 --> 00:30:36.395
So below you can see here you've got a frequency table.

759
00:30:36.655 --> 00:30:38.915
So frequency tables are really interesting

760
00:30:38.915 --> 00:30:42.715
because they can just, they're just the raw data shown

761
00:30:42.715 --> 00:30:44.675
to you in the easiest way they can.

762
00:30:44.855 --> 00:30:47.075
So raw data is data that has been manipulated,

763
00:30:47.145 --> 00:30:48.835
it's just obtained and it's just

764
00:30:48.865 --> 00:30:49.955
what they had on their sheet.

765
00:30:50.415 --> 00:30:53.275
You are generally gonna obtain your, your data

766
00:30:53.735 --> 00:30:56.715
for something like this in a frequency table.

767
00:30:56.855 --> 00:30:59.235
So it's the easiest and most simple way of displaying it.

768
00:30:59.505 --> 00:31:02.075
Usually you don't want it displayed in a frequency table.

769
00:31:02.655 --> 00:31:05.035
Um, the only way a frequency table can get a little bit more

770
00:31:05.035 --> 00:31:07.155
complex is if you put a percentage in it.

771
00:31:07.255 --> 00:31:10.155
So a percentage frequency table is essentially when you get

772
00:31:10.155 --> 00:31:11.555
the whole cohort and you find out

773
00:31:11.555 --> 00:31:13.515
what percentage each one was,

774
00:31:13.815 --> 00:31:15.635
and it should total out to a hundred percent.

775
00:31:15.735 --> 00:31:17.515
Now you can add this as an extra column,

776
00:31:17.735 --> 00:31:19.835
we can just make it separate like this one here.

777
00:31:21.575 --> 00:31:24.235
The most common way of displaying categorical unitary data

778
00:31:24.455 --> 00:31:26.355
is through a bar chart.

779
00:31:26.535 --> 00:31:28.715
Now, bar chart is distinctly different to a histogram

780
00:31:28.715 --> 00:31:31.955
and we will discuss that, um, when we get to our histograms,

781
00:31:32.375 --> 00:31:33.555
but it's really important to understand

782
00:31:33.555 --> 00:31:37.505
that categorical data has this sort of, um,

783
00:31:37.845 --> 00:31:41.465
broken up approach because the categories order

784
00:31:41.525 --> 00:31:42.625
or not are separate.

785
00:31:42.735 --> 00:31:45.025
They don't sort of follow along by numbers.

786
00:31:45.765 --> 00:31:47.105
We need to break them apart.

787
00:31:47.165 --> 00:31:49.465
We need to have gaps between our we bars.

788
00:31:49.765 --> 00:31:53.345
So the bar chart, there is a gap between each bar.

789
00:31:54.615 --> 00:31:57.985
Also in a bar chart we usually try and keep it as one color.

790
00:31:58.445 --> 00:32:01.105
So we usually try and keep each of the bars the same color,

791
00:32:01.335 --> 00:32:02.465
they're the same width

792
00:32:02.805 --> 00:32:05.545
and they're the same distance apart from each other.

793
00:32:05.885 --> 00:32:07.305
The only difference is their height,

794
00:32:07.435 --> 00:32:09.105
which is obviously their frequency,

795
00:32:09.105 --> 00:32:11.745
because on the wire access you've got your frequency,

796
00:32:11.745 --> 00:32:14.465
and on the x axis you've got your, your categories.

797
00:32:14.465 --> 00:32:16.385
So your type of pet or et cetera,

798
00:32:16.385 --> 00:32:19.305
or whatever it is f see here, pet's at home

799
00:32:19.885 --> 00:32:23.585
and you go, all right, I've got cats, dogs,

800
00:32:23.585 --> 00:32:24.665
birds, fish, et cetera.

801
00:32:24.685 --> 00:32:29.105
And you then do your frequency on your YY axis.

802
00:32:30.925 --> 00:32:33.705
So categorical data is also really interesting

803
00:32:33.705 --> 00:32:36.305
because it can be described in a really sort of succinct

804
00:32:36.645 --> 00:32:37.905
and straightforward way.

805
00:32:38.205 --> 00:32:40.325
So there's a template that I want you

806
00:32:40.325 --> 00:32:41.605
to use in your summary book as to

807
00:32:41.605 --> 00:32:44.845
how you're gonna answer all categorical numerical data

808
00:32:45.125 --> 00:32:46.205
questions where they want you to explain it.

809
00:32:46.205 --> 00:32:47.405
They want you to write a paragraph.

810
00:32:47.405 --> 00:32:49.565
They want you to say, Hey, what actually

811
00:32:49.565 --> 00:32:50.725
does that graph show you?

812
00:32:50.725 --> 00:32:51.805
What does that, what is that graph

813
00:32:51.805 --> 00:32:53.085
telling you about what's going on?

814
00:32:53.585 --> 00:32:55.925
Um, so you must essentially

815
00:32:56.605 --> 00:32:57.805
summarize the context of the data.

816
00:32:57.875 --> 00:32:59.605
It's always the first thing that I think you should do.

817
00:32:59.785 --> 00:33:02.125
You should always summarize what is going on in your data.

818
00:33:02.225 --> 00:33:03.805
So give some context, what is it,

819
00:33:03.805 --> 00:33:05.485
what are we actually measuring or what are we counting?

820
00:33:05.515 --> 00:33:07.885
What are we taking the frequency of what's going on?

821
00:33:08.805 --> 00:33:10.825
So obviously you won't be measuring, countering, counting

822
00:33:10.825 --> 00:33:11.905
to the categorical, but

823
00:33:11.905 --> 00:33:13.185
nonetheless you're saying, all right,

824
00:33:13.185 --> 00:33:16.065
I'm getting the frequency of the number of pet,

825
00:33:16.325 --> 00:33:17.985
of the not number of pets, the type

826
00:33:17.985 --> 00:33:19.145
of pets people have in my class.

827
00:33:20.005 --> 00:33:21.425
Um, you wanna identify the mode.

828
00:33:21.525 --> 00:33:23.425
So usually what's the most dominant category

829
00:33:23.765 --> 00:33:25.025
and then quote, its frequency

830
00:33:25.045 --> 00:33:26.705
and then quote, other frequencies of interest.

831
00:33:27.245 --> 00:33:28.985
So let's have a look at this question here.

832
00:33:29.105 --> 00:33:30.345
I want you to comment on the data shown

833
00:33:30.345 --> 00:33:31.425
in the frequency table below.

834
00:33:32.575 --> 00:33:35.075
So if you wanna have a go at this, please feel free.

835
00:33:35.455 --> 00:33:38.475
Um, I'll pause it here or you can pause it here.

836
00:33:38.495 --> 00:33:42.675
So pause here and take, you know, a good 30 seconds

837
00:33:42.775 --> 00:33:45.115
or so, maybe a little bit more like a minute

838
00:33:45.335 --> 00:33:48.035
and write out an answer that gives all three

839
00:33:48.175 --> 00:33:49.675
or four of those points context.

840
00:33:50.825 --> 00:33:53.795
What is the most dominant category, what is its frequency,

841
00:33:54.625 --> 00:33:57.715
what is some other points of interest?

842
00:33:57.855 --> 00:34:01.035
So here we've got, alright, so 3, 2, 1,

843
00:34:01.035 --> 00:34:02.995
pause, hopefully you're back.

844
00:34:03.455 --> 00:34:06.315
So here we've got three climates, hot, mild, cold.

845
00:34:06.315 --> 00:34:08.195
We given the frequency, we given the percentage frequency.

846
00:34:08.815 --> 00:34:11.355
So we'll start off with this is what our answer looks like,

847
00:34:11.655 --> 00:34:12.955
but how do we break this down?

848
00:34:12.955 --> 00:34:14.115
Well, first of all, we had context.

849
00:34:14.415 --> 00:34:17.555
The climate types of 23 countries were classified

850
00:34:17.555 --> 00:34:18.995
as being hot, mild, or cold.

851
00:34:19.175 --> 00:34:20.635
So we've given context.

852
00:34:20.965 --> 00:34:25.515
There were 23 countries that were um, that were,

853
00:34:25.935 --> 00:34:27.515
you know, surveyed in this sense.

854
00:34:27.815 --> 00:34:29.675
Um, we weren't given the context that they were,

855
00:34:30.415 --> 00:34:31.635
um, in the question.

856
00:34:32.215 --> 00:34:33.875
You weren't given the context of these were countries.

857
00:34:33.875 --> 00:34:35.395
So if you didn't mention countries, that's okay.

858
00:34:35.695 --> 00:34:37.635
Um, but obviously in this question we're talking about

859
00:34:37.835 --> 00:34:38.875
countries, um,

860
00:34:39.575 --> 00:34:43.195
and we wanted to classify em as being hot, uh, hot, mild

861
00:34:43.195 --> 00:34:44.315
or cold or cold, mild hot,

862
00:34:44.315 --> 00:34:45.675
which talk about it, it's really matter.

863
00:34:46.425 --> 00:34:48.675
Nonetheless, that's a good way of sort of looking at it.

864
00:34:49.305 --> 00:34:50.475
Then we say the majority

865
00:34:50.495 --> 00:34:53.915
of countries 60.8% down, have a mile client.

866
00:34:53.935 --> 00:34:56.115
Now, really important percentage frequency

867
00:34:56.635 --> 00:34:57.915
dominates standard frequency.

868
00:34:58.015 --> 00:35:00.875
So please use percentage frequency if it is available.

869
00:35:01.135 --> 00:35:03.835
If it's not available, you can use your common frequency.

870
00:35:04.175 --> 00:35:07.555
But I always suggest using percentage frequency over common

871
00:35:07.635 --> 00:35:09.755
frequency because it's already been manipulated,

872
00:35:09.825 --> 00:35:11.035
it's a better piece of data,

873
00:35:11.105 --> 00:35:12.395
it's a better piece of information.

874
00:35:12.815 --> 00:35:16.355
Um, we like it more. It gives and provides more information.

875
00:35:16.815 --> 00:35:19.515
Um, so therefore I always say, you know what,

876
00:35:19.535 --> 00:35:21.195
if it's there, utilize it.

877
00:35:22.695 --> 00:35:25.475
And then we've got, um, our

878
00:35:26.915 --> 00:35:28.105
other pieces of information.

879
00:35:28.105 --> 00:35:29.265
So no other frequencies.

880
00:35:29.365 --> 00:35:30.705
So of the remaining countries,

881
00:35:30.705 --> 00:35:32.985
26.1% were found have a hot climate while

882
00:35:32.985 --> 00:35:34.505
30% were found have a cold climate.

883
00:35:34.525 --> 00:35:36.265
So it's just commented on the other frequencies.

884
00:35:36.285 --> 00:35:38.245
Now this table might've been much larger,

885
00:35:38.295 --> 00:35:39.325
there might've been, you know,

886
00:35:39.335 --> 00:35:40.685
eight categories or something.

887
00:35:41.225 --> 00:35:42.805
If it's much larger, you don't need

888
00:35:42.805 --> 00:35:44.245
to comment on every single one.

889
00:35:44.275 --> 00:35:48.445
Just ones of note, you could say the least common

890
00:35:49.115 --> 00:35:51.005
climate was coal with 13%

891
00:35:51.145 --> 00:35:54.445
or the next most common climate was mild

892
00:35:55.185 --> 00:35:57.165
or hot, sorry, with 26.1%.

893
00:35:57.165 --> 00:35:59.605
So you just give what is the next, what is the least

894
00:35:59.665 --> 00:36:01.085
or just something that's interesting.

895
00:36:01.425 --> 00:36:04.605
Um, you don't want to say every single thing in this here

896
00:36:04.625 --> 00:36:06.645
you can because there's not a lot of information,

897
00:36:07.625 --> 00:36:09.205
but you don't wanna go out there and just write every

898
00:36:09.205 --> 00:36:10.445
single thing down on the page.

899
00:36:12.425 --> 00:36:14.965
So then we move into numerical.

900
00:36:15.265 --> 00:36:17.325
So numerical also really interesting.

901
00:36:17.585 --> 00:36:20.485
You can also display numerical data in a frequency table.

902
00:36:20.855 --> 00:36:22.525
Again, this is your last resort,

903
00:36:22.545 --> 00:36:23.885
you don't really wanna be doing this.

904
00:36:24.265 --> 00:36:26.845
Um, so it's sort of the last resort as such.

905
00:36:27.225 --> 00:36:30.905
Um, and you can sort of put it into a frequency table.

906
00:36:30.925 --> 00:36:32.545
You can also sort of range it.

907
00:36:33.565 --> 00:36:35.345
So in this case here you can use ranges.

908
00:36:35.765 --> 00:36:38.065
Um, think of this as like intervals.

909
00:36:38.365 --> 00:36:40.825
Um, you're also welcome to utilize this with tables,

910
00:36:41.085 --> 00:36:43.785
but again, last resort we'd rather use like a dot plot.

911
00:36:45.895 --> 00:36:49.835
So if we look at this, here we go, all right, dot plots used

912
00:36:49.835 --> 00:36:54.395
for discrete numerical data only the X access is is used.

913
00:36:54.455 --> 00:36:56.235
So we don't have a y accessed.

914
00:36:56.655 --> 00:36:59.275
Um, and the number of dots represents the frequency.

915
00:36:59.375 --> 00:37:01.275
So essentially we have pieces of information

916
00:37:01.275 --> 00:37:03.755
that are on 1, 2, 3, 4, 5, sorry,

917
00:37:03.755 --> 00:37:04.915
that's sort of been switched down.

918
00:37:05.295 --> 00:37:07.115
Um, didn't really hold its position well.

919
00:37:07.615 --> 00:37:10.555
Um, but as you can see here, we have four on one,

920
00:37:10.575 --> 00:37:13.515
we have two on two, we have one on three, et cetera.

921
00:37:13.735 --> 00:37:15.835
So what we're looking at with the dot plot is just

922
00:37:16.595 --> 00:37:18.215
we are looking at sort of how many pieces

923
00:37:18.215 --> 00:37:19.295
of information are each one.

924
00:37:19.355 --> 00:37:21.935
We can then sort of find a mean median or a mode, et cetera.

925
00:37:22.085 --> 00:37:24.255
This would really be used for this numerical,

926
00:37:24.455 --> 00:37:27.895
discreet rather than numerical, continuous numerical.

927
00:37:27.895 --> 00:37:30.615
Continuous is probably not all that useful on this one.

928
00:37:30.615 --> 00:37:33.445
Here you can find the mean.

929
00:37:33.785 --> 00:37:35.605
Um, you can do that by counting the dots

930
00:37:35.605 --> 00:37:36.605
and then finding the middle value.

931
00:37:36.705 --> 00:37:39.325
So if I go here, I've already got it up on my page,

932
00:37:39.325 --> 00:37:42.445
but if I go 1, 2, 3, 4 in our first one,

933
00:37:42.835 --> 00:37:45.565
I've got two on my second one that's six, I've got seven,

934
00:37:46.155 --> 00:37:48.285
I've got eight, nine, I've got 10, 11.

935
00:37:48.315 --> 00:37:50.765
Well, my middle point is going to be my sixth point

936
00:37:50.765 --> 00:37:52.245
because there's five on either side.

937
00:37:52.385 --> 00:37:55.605
1, 2, 3, 4, 5, and then six and,

938
00:37:55.745 --> 00:37:58.125
and pardon me.

939
00:37:58.305 --> 00:38:02.685
And then 7, 8, 9, 10, 11. So there's my two points of 10.

940
00:38:04.065 --> 00:38:06.605
So having my two points of 10, um, I'm

941
00:38:06.725 --> 00:38:08.725
therefore going to have, um,

942
00:38:10.235 --> 00:38:12.175
my middle point being six.

943
00:38:12.435 --> 00:38:14.175
As you can see here, I've got four.

944
00:38:14.315 --> 00:38:17.015
And then if I'm counting down, I can say one and then two.

945
00:38:17.015 --> 00:38:18.095
That's my sixth point.

946
00:38:18.115 --> 00:38:20.055
If I'm counting up, you're welcome to do the top one.

947
00:38:20.055 --> 00:38:22.055
Doesn't really matter. Both gonna end up being two.

948
00:38:22.725 --> 00:38:24.065
You've got your medium value.

949
00:38:25.805 --> 00:38:27.345
Um, and then you, your stem and lead plots.

950
00:38:27.345 --> 00:38:29.785
Now stem and lead plots can be utilized for continuous.

951
00:38:29.815 --> 00:38:32.905
Once again, probably still not all that useful,

952
00:38:33.045 --> 00:38:35.285
but you can use continuous data on it.

953
00:38:35.805 --> 00:38:37.045
Probably the more useful way.

954
00:38:37.705 --> 00:38:39.205
Um, we'll talk about it a bit,

955
00:38:39.265 --> 00:38:42.965
but it's more useful tier to do numerical, discreet.

956
00:38:42.975 --> 00:38:44.845
Again. Now you've got a stem.

957
00:38:45.345 --> 00:38:47.405
So essentially here we're saying that um,

958
00:38:47.865 --> 00:38:50.405
our stem represents 40, so four.

959
00:38:50.785 --> 00:38:54.045
And then whatever goes behind it is the, the one.

960
00:38:54.065 --> 00:38:55.845
So this example here would be 41.

961
00:38:56.875 --> 00:38:58.775
So your stem is the first digit you leave is

962
00:38:58.775 --> 00:38:59.855
the last digit digits.

963
00:39:00.155 --> 00:39:01.855
Uh, same with the stem. It can be more than one.

964
00:39:02.315 --> 00:39:04.095
Um, so example, as you can see, it's like 40.

965
00:39:04.195 --> 00:39:07.775
And then one, um, you always need to include a key here

966
00:39:07.775 --> 00:39:08.855
because in this case here,

967
00:39:08.855 --> 00:39:10.655
what if the four was representing 400

968
00:39:11.035 --> 00:39:13.135
and then the one was representing the tens.

969
00:39:13.135 --> 00:39:14.415
So it could be 410.

970
00:39:15.035 --> 00:39:17.215
Um, really important that you do that

971
00:39:17.215 --> 00:39:20.415
because in most cases, if you don't have a legend,

972
00:39:20.435 --> 00:39:22.255
you cannot utilize a stem and leaf plot.

973
00:39:22.275 --> 00:39:23.375
And your answer will be wrong

974
00:39:23.375 --> 00:39:24.535
if you draw a stem and leaf plot.

975
00:39:24.535 --> 00:39:26.295
But don't give a legend or a key.

976
00:39:27.735 --> 00:39:32.275
As you can see here, you get your stem 1, 2, 3, 4, 5, 6,

977
00:39:32.375 --> 00:39:34.755
and you've been told in your key that if it's one

978
00:39:34.815 --> 00:39:36.195
and then zero, it's 10.

979
00:39:36.615 --> 00:39:38.555
So if it's two and then zero, it's 20.

980
00:39:38.855 --> 00:39:42.795
If it's three, and then five, it's 35, something like that.

981
00:39:43.215 --> 00:39:44.275
So you can see here in your stem

982
00:39:44.275 --> 00:39:45.915
and leaf plot, you've got a whole bunch of values.

983
00:39:45.915 --> 00:39:47.595
They've been ordered. Um,

984
00:39:48.375 --> 00:39:50.635
and as you can z this answer is 10.

985
00:39:51.015 --> 00:39:52.995
But if I said, oh, what if this was a hundred?

986
00:39:53.145 --> 00:39:55.355
Well then I'd have to make this a hundred.

987
00:39:55.615 --> 00:39:58.635
So it's really important that you know how to utilize that.

988
00:39:59.015 --> 00:40:03.475
You know how to apply that distinct methodology

989
00:40:04.135 --> 00:40:07.595
of getting your distinct methodology

990
00:40:07.655 --> 00:40:09.355
of utilizing a stem and leaf plot.

991
00:40:09.455 --> 00:40:11.035
It is a little bit difficult.

992
00:40:12.455 --> 00:40:16.395
Um, so really important, a leaf plot has to be ordered.

993
00:40:16.975 --> 00:40:19.155
Um, and you can split a stem and leaf plot.

994
00:40:19.155 --> 00:40:20.235
Now what I mean by that is,

995
00:40:20.295 --> 00:40:22.075
so let's just use two here for example.

996
00:40:22.095 --> 00:40:23.475
So you can see our two on our screen.

997
00:40:23.775 --> 00:40:25.315
We have a bit of information, we have four.

998
00:40:25.735 --> 00:40:27.915
Now what if I wanted to break that down

999
00:40:27.935 --> 00:40:30.395
and say, all right, I want two, two categories.

1000
00:40:30.505 --> 00:40:34.355
Well, then I'd go, you know, one, two and then two again.

1001
00:40:34.895 --> 00:40:37.715
Now the first two category, you go zero to four.

1002
00:40:38.015 --> 00:40:40.595
The second two category, you go five to nine.

1003
00:40:40.615 --> 00:40:41.675
Now you're welcome to do that.

1004
00:40:42.015 --> 00:40:43.795
If you do that, you need to do it for all of them.

1005
00:40:44.295 --> 00:40:46.595
So for example, in our four one, as much

1006
00:40:46.595 --> 00:40:48.155
as there's only three things there, you'd have

1007
00:40:48.155 --> 00:40:51.235
to do two fours, you go four and you'd have your two fours.

1008
00:40:51.235 --> 00:40:53.755
Next to that, you go four again, and you do your seven.

1009
00:40:54.025 --> 00:40:56.675
Same as six, you do six, and you go zero one,

1010
00:40:56.675 --> 00:40:58.755
and then you do another six below it and you do the nine.

1011
00:40:59.095 --> 00:41:01.275
So you are welcome to do that when you have lots

1012
00:41:01.275 --> 00:41:04.235
of information breaking down a stamina leaf plot into

1013
00:41:04.315 --> 00:41:06.155
doubles is really, really useful.

1014
00:41:06.855 --> 00:41:09.835
But only do it if it's, you know, you've got like six

1015
00:41:09.835 --> 00:41:11.795
or seven pieces of information in each one.

1016
00:41:12.715 --> 00:41:14.635
I think if it's less than that, like this one here,

1017
00:41:14.935 --> 00:41:16.155
it works as is.

1018
00:41:16.355 --> 00:41:17.995
I think once you get to like 6, 7, 8,

1019
00:41:18.225 --> 00:41:19.835
then it's useful breaking it down.

1020
00:41:19.935 --> 00:41:22.035
But before that, it's not that useful.

1021
00:41:22.475 --> 00:41:24.875
I wouldn't bother doing it. It's not all that.

1022
00:41:25.495 --> 00:41:27.875
Um, it's not the best thing you can do.

1023
00:41:28.415 --> 00:41:31.235
Um, and it's not for STEM employees, sorry, it's a clo.

1024
00:41:32.535 --> 00:41:34.125
Now, histograms.

1025
00:41:34.125 --> 00:41:36.645
Now this is where we talk more about a numerical discreet.

1026
00:41:36.645 --> 00:41:38.605
And the best way to talk about this is usually putting

1027
00:41:38.605 --> 00:41:39.605
things into categories.

1028
00:41:39.945 --> 00:41:42.125
Um, not, not categories, intervals,

1029
00:41:42.125 --> 00:41:43.285
sorry, I shouldn't say categories.

1030
00:41:43.325 --> 00:41:45.245
I say intervals. Putting things into intervals.

1031
00:41:45.245 --> 00:41:46.245
Now the example I come on, the screen

1032
00:41:46.245 --> 00:41:46.925
doesn't have intervals.

1033
00:41:47.185 --> 00:41:49.965
Um, once it pops up, uh, no, it doesn't have intervals.

1034
00:41:49.965 --> 00:41:51.925
Sorry, I did change this. The example we used

1035
00:41:51.925 --> 00:41:54.045
to have didn't have intervals and I was like, it annoyed me.

1036
00:41:54.045 --> 00:41:55.285
So I've obviously gone and change that.

1037
00:41:55.285 --> 00:41:58.585
I did forget that I changed it intervals.

1038
00:41:58.885 --> 00:42:00.905
So intervals are really, really useful

1039
00:42:00.905 --> 00:42:03.145
because then all your continuous data, you're sort

1040
00:42:03.145 --> 00:42:04.865
of subcategorize it in a sense.

1041
00:42:05.045 --> 00:42:09.145
Um, but it's still categorical data and using the raw data.

1042
00:42:09.245 --> 00:42:12.505
So in this graph, you cannot find your mean or your mode.

1043
00:42:13.055 --> 00:42:14.625
It's not that useful.

1044
00:42:15.205 --> 00:42:16.225
You can find a median

1045
00:42:16.225 --> 00:42:18.945
or a mode in terms of your intervals, but that's not useful.

1046
00:42:19.125 --> 00:42:20.745
You get your raw data out

1047
00:42:20.845 --> 00:42:23.145
and that's when you find your mean, your median mode.

1048
00:42:23.495 --> 00:42:25.465
What you look at here, if you're able to describe this,

1049
00:42:25.465 --> 00:42:26.945
is you look more at the shape,

1050
00:42:27.325 --> 00:42:29.745
you look at the spread, you look at the center.

1051
00:42:29.925 --> 00:42:32.825
So you say, all right, um, our center,

1052
00:42:33.105 --> 00:42:34.825
a central spot is between five and nine.

1053
00:42:34.825 --> 00:42:38.185
So you don't talk about the exact median or the exact mean.

1054
00:42:38.525 --> 00:42:40.585
You talk about sort of your center spot,

1055
00:42:40.765 --> 00:42:41.825
and then you talk about outliers.

1056
00:42:41.825 --> 00:42:43.185
You might have a bit of outlier information,

1057
00:42:43.185 --> 00:42:44.625
like you might have a really big peak somewhere

1058
00:42:44.625 --> 00:42:45.785
and it's not that useful.

1059
00:42:46.885 --> 00:42:48.425
You talk about that as an outlier.

1060
00:42:49.445 --> 00:42:52.465
So that's where histograms become more common

1061
00:42:52.485 --> 00:42:53.825
for numerical continuous,

1062
00:42:53.825 --> 00:42:55.865
because then you sort of interval it, if you put

1063
00:42:55.865 --> 00:42:57.265
that information into intervals

1064
00:42:57.525 --> 00:43:00.025
and you get the relevant information out.

1065
00:43:00.165 --> 00:43:02.905
Now, numerical continuous data is most

1066
00:43:02.905 --> 00:43:04.025
useful when it's biber.

1067
00:43:04.355 --> 00:43:08.185
We'll talk about that a little bit later on, but that's

1068
00:43:08.185 --> 00:43:10.185
because sort of our bi-variate data

1069
00:43:10.695 --> 00:43:12.585
just provides a little bit

1070
00:43:12.655 --> 00:43:15.915
more, if that makes sense.

1071
00:43:15.975 --> 00:43:18.595
You get a little bit more ability to manipulate,

1072
00:43:18.595 --> 00:43:20.795
you get a little bit more ability to work with it

1073
00:43:21.015 --> 00:43:22.275
and a little bit more ability to sort

1074
00:43:22.275 --> 00:43:23.635
of just display it differently.

1075
00:43:24.015 --> 00:43:27.235
And that ability really, really shows

1076
00:43:27.825 --> 00:43:29.075
when you have continuous data.

1077
00:43:29.135 --> 00:43:31.315
And continuous data is far more useful when it's biva.

1078
00:43:31.465 --> 00:43:33.795
When it's univar it's not as useful.

1079
00:43:35.895 --> 00:43:39.915
So this is where Univar data gets really tough.

1080
00:43:39.975 --> 00:43:43.475
So Univar data as a whole is not all that difficult.

1081
00:43:43.825 --> 00:43:45.635
This is where it gets quite difficult.

1082
00:43:45.695 --> 00:43:47.555
And this is the point in this session,

1083
00:43:47.565 --> 00:43:49.275
which if you haven't really been paying attention,

1084
00:43:49.545 --> 00:43:52.195
that is fine, you should pay attention right now.

1085
00:43:52.705 --> 00:43:55.115
This is where things get a little bit difficult.

1086
00:43:55.255 --> 00:43:58.395
So sometimes we have a data rate and it's too big.

1087
00:43:58.575 --> 00:44:00.435
The most common example of this is if you look at the

1088
00:44:00.435 --> 00:44:03.035
populations of the world, you look at somewhere like Samoa,

1089
00:44:03.055 --> 00:44:05.435
and I think Samoa is somewhere around, you know,

1090
00:44:05.785 --> 00:44:07.355
9,000, 10,000 people.

1091
00:44:07.355 --> 00:44:09.315
Even like the Vatican City, Vatican City.

1092
00:44:09.475 --> 00:44:11.635
I don't, don't ask me how that is a country, but

1093
00:44:11.635 --> 00:44:14.875
nonetheless, Vatican City is something like 300 people.

1094
00:44:15.545 --> 00:44:18.685
It's really, really small. Same with somewhere like Samoa.

1095
00:44:18.695 --> 00:44:21.725
Samoa has something like 9,000 or 10,000 people in it.

1096
00:44:21.985 --> 00:44:23.605
So that's really small.

1097
00:44:23.745 --> 00:44:25.845
But then you compare it with somewhere like China

1098
00:44:25.865 --> 00:44:27.885
or India, which have over a billion people.

1099
00:44:28.105 --> 00:44:29.925
You put them on the same histogram

1100
00:44:30.305 --> 00:44:31.925
and it's just, it just doesn't work.

1101
00:44:31.925 --> 00:44:34.965
You're too far apart. You cannot work with that.

1102
00:44:35.195 --> 00:44:37.205
Your histogram's gonna take up more than one page

1103
00:44:37.205 --> 00:44:38.925
because you're just spreading it out.

1104
00:44:39.105 --> 00:44:41.645
Unless you, you, you do really big categories

1105
00:44:41.645 --> 00:44:44.165
and then that sort of skews it, it's not right, you're sort

1106
00:44:44.165 --> 00:44:45.685
of manipulating it incorrectly.

1107
00:44:47.225 --> 00:44:49.085
So in these cases, we use

1108
00:44:49.085 --> 00:44:50.925
what is called a log scale histogram.

1109
00:44:51.025 --> 00:44:52.525
And you might say, what the heck is that?

1110
00:44:53.805 --> 00:44:55.165
A log scale works like this.

1111
00:44:55.385 --> 00:44:58.765
So normal, a normal scale goes, you know, zero to 10, 10

1112
00:44:58.765 --> 00:45:00.645
to 20, 20 to 30, 34.

1113
00:45:00.705 --> 00:45:02.045
So you have 10 in between them.

1114
00:45:02.145 --> 00:45:03.445
That's how a normal one works.

1115
00:45:03.585 --> 00:45:06.005
So if I've got 10 in between them, I'm counting up.

1116
00:45:06.425 --> 00:45:09.405
So we use pluses and minuses.

1117
00:45:09.405 --> 00:45:13.245
On a normal scale, we plus this much or we minus this much.

1118
00:45:14.945 --> 00:45:18.785
A log scale works by multiplication or division.

1119
00:45:19.645 --> 00:45:24.305
So one to 10, multiply by 10, 10 to 20 a, 10

1120
00:45:24.305 --> 00:45:26.585
to a hundred, we multiplying by another 10.

1121
00:45:26.965 --> 00:45:28.305
If I go a hundred to a thousand,

1122
00:45:28.365 --> 00:45:29.465
I'm multiplying by another 10.

1123
00:45:29.805 --> 00:45:34.665
So a log scale puts things in categories of one to 10, 10

1124
00:45:34.665 --> 00:45:38.625
to a hundred, a hundred to 1000, 1000 to 10,000.

1125
00:45:38.965 --> 00:45:42.225
So the intervals are all different sizes.

1126
00:45:42.255 --> 00:45:44.585
However, they are the same width.

1127
00:45:44.645 --> 00:45:47.145
The part, you're not making the intervals different widths

1128
00:45:47.145 --> 00:45:49.905
apart, you're keeping them the same width, the part,

1129
00:45:50.565 --> 00:45:53.985
but what is in the interval is very distinctly different.

1130
00:45:54.005 --> 00:45:55.545
The first interval is made up of 10 numbers.

1131
00:45:55.685 --> 00:45:57.465
The next interval is made up of 90 numbers.

1132
00:45:57.845 --> 00:45:59.545
The next interval is made up of 900.

1133
00:45:59.885 --> 00:46:04.585
The next interval is made up of 9,000, et cetera.

1134
00:46:06.045 --> 00:46:07.265
Um, and so

1135
00:46:07.265 --> 00:46:11.145
therefore a log scale gives these numbers.

1136
00:46:11.445 --> 00:46:14.345
So there are a few properties I want you to understand

1137
00:46:14.345 --> 00:46:15.625
before I sort of describe them.

1138
00:46:16.785 --> 00:46:19.565
If the raw piece of data is greater than one,

1139
00:46:20.595 --> 00:46:24.205
it's log value is greater than zero.

1140
00:46:25.785 --> 00:46:27.605
If a number is greater than zero.

1141
00:46:27.705 --> 00:46:31.445
So the raw piece of data, so there is, it's the, it's

1142
00:46:31.445 --> 00:46:33.085
between zero and it's between one.

1143
00:46:33.105 --> 00:46:35.885
So it's like 0.5 is the piece of data that you've got.

1144
00:46:36.435 --> 00:46:38.605
It's log value is actually negative.

1145
00:46:39.065 --> 00:46:40.925
So you're gonna get a negative log value.

1146
00:46:41.345 --> 00:46:43.245
So your log value will be negative.

1147
00:46:43.785 --> 00:46:45.205
Now you do this all and you cover that.

1148
00:46:47.285 --> 00:46:48.505
If the number is zero,

1149
00:46:48.765 --> 00:46:50.385
so they actually didn't have anything.

1150
00:46:50.405 --> 00:46:51.705
So you know, you are measuring

1151
00:46:51.705 --> 00:46:53.625
and there was nothing, it's undefined.

1152
00:46:53.625 --> 00:46:55.225
You're not gonna be able to log that, don't worry.

1153
00:46:55.885 --> 00:46:58.425
And for some reason you are measuring something

1154
00:46:58.425 --> 00:47:00.185
that goes into negatives, maybe temperature.

1155
00:47:00.485 --> 00:47:02.745
If the temperature is in its negatives,

1156
00:47:03.245 --> 00:47:05.465
you actually cannot utilize logs.

1157
00:47:05.965 --> 00:47:08.745
So there's no way of you of logging a negative number.

1158
00:47:08.925 --> 00:47:10.545
It doesn't work. So

1159
00:47:10.545 --> 00:47:14.705
therefore what we're saying is that if you are, you know,

1160
00:47:14.705 --> 00:47:17.025
measuring temperatures and they're really, really far apart

1161
00:47:17.025 --> 00:47:18.505
and you want to use log scales,

1162
00:47:18.505 --> 00:47:19.705
but then you get a negative temperature,

1163
00:47:19.705 --> 00:47:21.785
well you're not gonna actually be able to plot that piece

1164
00:47:21.785 --> 00:47:23.065
of information on the log scale.

1165
00:47:25.085 --> 00:47:27.945
So when displaying logs on anxi, we use

1166
00:47:28.685 --> 00:47:30.045
the order of magnitude.

1167
00:47:30.425 --> 00:47:31.965
So 10 to the two becomes two,

1168
00:47:31.995 --> 00:47:34.765
therefore we must label the access as log variables.

1169
00:47:34.765 --> 00:47:36.165
Now this is what I mean by this.

1170
00:47:36.465 --> 00:47:38.005
So essentially,

1171
00:47:38.145 --> 00:47:39.405
and again, I didn't

1172
00:47:39.405 --> 00:47:40.565
fix this one, I should have fixed this one.

1173
00:47:40.565 --> 00:47:41.565
This history is wrong as well.

1174
00:47:42.415 --> 00:47:47.045
These are intervals, think of this interval as zero to one.

1175
00:47:47.345 --> 00:47:49.805
So the first one here, as you can see on the screen,

1176
00:47:50.035 --> 00:47:51.845
this is zero to one.

1177
00:47:52.465 --> 00:47:55.205
So if I'm between a log of zero

1178
00:47:56.065 --> 00:47:59.155
to one, where am I?

1179
00:47:59.295 --> 00:48:01.555
So remember we discussed on our last page

1180
00:48:01.825 --> 00:48:06.715
that if the raw piece of data is greater than one, then

1181
00:48:07.255 --> 00:48:09.395
our log is greater than zero.

1182
00:48:09.395 --> 00:48:10.555
Well, if I'm between zero

1183
00:48:10.615 --> 00:48:15.555
and one, that means my raw data is between one and 10.

1184
00:48:16.215 --> 00:48:19.955
So this is between zero and one on a log, it's between one

1185
00:48:19.955 --> 00:48:21.435
and 10 in my raw data.

1186
00:48:22.375 --> 00:48:25.275
If I'm between next category, we'll say this is

1187
00:48:25.275 --> 00:48:26.755
between one and two.

1188
00:48:27.635 --> 00:48:30.175
So we'll say, all right, this is between one and two.

1189
00:48:30.795 --> 00:48:32.295
Now being between one

1190
00:48:32.555 --> 00:48:37.535
and two, I'd say that my log is between 10 and a hundred.

1191
00:48:37.755 --> 00:48:40.655
So my raw data is between 10 and a hundred,

1192
00:48:41.195 --> 00:48:43.455
but I'm saying it's between a log of one and two.

1193
00:48:44.505 --> 00:48:47.235
Next one I say, all right, it's between two and three.

1194
00:48:47.235 --> 00:48:49.195
Again, this, this got squished down.

1195
00:48:49.235 --> 00:48:51.715
I do apologize, but I don't know why a lot of the, um,

1196
00:48:53.085 --> 00:48:54.365
graphs when I was, it's

1197
00:48:54.405 --> 00:48:55.805
'cause I changed the shape of the slides

1198
00:48:55.825 --> 00:48:57.845
to fit the full screen and they,

1199
00:48:57.845 --> 00:48:59.165
they got me dragged down to apologize.

1200
00:48:59.245 --> 00:49:01.005
I should have edited that out.

1201
00:49:01.345 --> 00:49:03.565
But let's just say the next one's between two and three.

1202
00:49:03.865 --> 00:49:05.405
So our next one here is between two

1203
00:49:05.405 --> 00:49:07.605
and three being between two and three.

1204
00:49:08.425 --> 00:49:11.725
We say that is between 101,000.

1205
00:49:11.865 --> 00:49:13.525
So any piece of information measured is

1206
00:49:13.525 --> 00:49:14.925
between one and 1000.

1207
00:49:14.925 --> 00:49:16.285
And that's why we call this log.

1208
00:49:16.585 --> 00:49:17.925
And then what variable is,

1209
00:49:18.385 --> 00:49:20.045
so if we're talking about population,

1210
00:49:20.045 --> 00:49:21.925
we'd say log bracket population.

1211
00:49:23.185 --> 00:49:25.965
And as you can see here, this is what our log guide means.

1212
00:49:26.465 --> 00:49:29.445
So your log is the value up here.

1213
00:49:29.595 --> 00:49:32.445
It's this value that the, the 10 is to the power of.

1214
00:49:32.945 --> 00:49:36.965
So a log of two is essentially 10 to the two.

1215
00:49:37.705 --> 00:49:39.325
And it is 100.

1216
00:49:40.005 --> 00:49:42.645
A log of three is essentially 10 to the three.

1217
00:49:43.265 --> 00:49:46.165
10 to the power of three 10 to the power of three is 1000.

1218
00:49:46.605 --> 00:49:49.605
A log of negative one is equal to 0.1.

1219
00:49:49.865 --> 00:49:53.525
So I'm between, if my interval is between a log, a log value

1220
00:49:53.525 --> 00:49:57.645
of negative one to zero, I'm between 0.1 and one.

1221
00:49:59.595 --> 00:50:02.615
Now if I'm between a log value of zero and one.

1222
00:50:02.915 --> 00:50:06.335
I'm between one and 10. So that is your guide.

1223
00:50:07.415 --> 00:50:09.595
Now, how do I actually calculate logs?

1224
00:50:09.595 --> 00:50:13.355
Like what if I've got a value of, you know, 245?

1225
00:50:13.355 --> 00:50:14.595
Well, I've got a value of 203.

1226
00:50:14.655 --> 00:50:16.435
Oh, maybe we'll use something. I do something else.

1227
00:50:16.505 --> 00:50:18.515
What about if I say I have a value of one 50,

1228
00:50:19.345 --> 00:50:20.385
I have a value of one 50.

1229
00:50:20.685 --> 00:50:23.585
Now you might say to me, um, I think that is going

1230
00:50:23.585 --> 00:50:24.745
to be 1.5.

1231
00:50:25.845 --> 00:50:27.645
Um, no, sorry, yeah, 1.5.

1232
00:50:27.675 --> 00:50:31.405
I've got a log of, no wait, that's 15. I'm gonna say one 50.

1233
00:50:31.415 --> 00:50:35.125
Sorry, I've got a log of one 50.

1234
00:50:35.785 --> 00:50:38.525
So you might say to me, being a log of one point of,

1235
00:50:38.825 --> 00:50:41.045
of one 50, you could figure out where it's gonna be

1236
00:50:41.425 --> 00:50:44.085
and you could say, you know, it's gonna be close to two.

1237
00:50:44.085 --> 00:50:45.125
Well, that's probably right.

1238
00:50:46.085 --> 00:50:47.925
I was trying to think of probably a better way

1239
00:50:47.925 --> 00:50:49.885
of saying it is what if I thought of a log

1240
00:50:49.985 --> 00:50:51.405
of, we'll go back here.

1241
00:50:52.715 --> 00:50:54.735
I'm looking at a raw piece of data

1242
00:50:54.795 --> 00:50:58.375
and my raw piece of data is 5.5.

1243
00:50:58.795 --> 00:51:02.175
So 5.5 is smack bang in the middle of one and 10

1244
00:51:02.175 --> 00:51:04.495
because there are nine numbers between one and 10.

1245
00:51:04.715 --> 00:51:05.855
The middle is four and a half.

1246
00:51:05.955 --> 00:51:08.495
So I add four and a half to one. I get five and a half.

1247
00:51:08.915 --> 00:51:10.055
So I'm at five and a half.

1248
00:51:10.235 --> 00:51:12.135
My raw piece of data is five and a half.

1249
00:51:13.165 --> 00:51:14.465
My raw piece of data being five

1250
00:51:14.465 --> 00:51:16.545
and a half being smack bang in the middle of one

1251
00:51:16.545 --> 00:51:19.025
and 10 being, I would assume it would be smack

1252
00:51:19.025 --> 00:51:20.145
bang in the middle of zero.

1253
00:51:20.285 --> 00:51:23.665
And one, what you'll find is it's actually not,

1254
00:51:24.835 --> 00:51:28.365
because log scales go up very differently.

1255
00:51:28.635 --> 00:51:31.445
They don't go up uniformly because of that multiplication.

1256
00:51:31.585 --> 00:51:34.405
And therefore your actual log value

1257
00:51:35.095 --> 00:51:39.005
might be more like, you know, a 0.4

1258
00:51:39.145 --> 00:51:40.365
or a 0.6.

1259
00:51:40.475 --> 00:51:43.125
It's not gonna be 0.5,

1260
00:51:43.125 --> 00:51:45.805
it's not gonna be smack bang in the middle even though the

1261
00:51:45.865 --> 00:51:47.645
raw data is smack bang in the middle.

1262
00:51:48.665 --> 00:51:52.165
Now always, always, always use your calculator

1263
00:51:52.185 --> 00:51:54.765
to calculate your log value.

1264
00:51:54.905 --> 00:51:57.005
So if I wanna find a log down, I've been given the raw data

1265
00:51:57.005 --> 00:52:00.365
and I say, all right, I've been given 150 as my raw data.

1266
00:52:01.285 --> 00:52:05.765
I go log 10 of one 50. Now this will be in your calculator.

1267
00:52:05.765 --> 00:52:07.525
You'll be able to find this in your calculator.

1268
00:52:07.525 --> 00:52:11.325
There is a log, um, in your keyboards,

1269
00:52:11.555 --> 00:52:12.925
your padio class pad,

1270
00:52:12.985 --> 00:52:14.285
or if you're on your 10 inspired,

1271
00:52:14.285 --> 00:52:16.565
there's actually just like a shift shortcut to one

1272
00:52:16.565 --> 00:52:20.005
of the buttons and you find that you'll get your log 10

1273
00:52:20.545 --> 00:52:21.765
and then it'll give you brackets

1274
00:52:21.765 --> 00:52:23.285
and you just have to put the number in the brackets

1275
00:52:23.285 --> 00:52:25.285
and press enter and you get your answer.

1276
00:52:25.585 --> 00:52:29.125
So as you can see here, your log 10 of one 50 is 2.1,

1277
00:52:29.125 --> 00:52:30.845
which if I go back, it makes sense.

1278
00:52:30.915 --> 00:52:32.365
It's between two and three.

1279
00:52:33.105 --> 00:52:37.065
And also really important here is to identify

1280
00:52:37.205 --> 00:52:38.545
how a log scale is working.

1281
00:52:39.285 --> 00:52:43.185
150 is only 50 of the 900.

1282
00:52:43.615 --> 00:52:47.825
There's 900 raw pieces of data between 101,000.

1283
00:52:48.365 --> 00:52:52.145
And I'm only 50 in, I'd assume that I am,

1284
00:52:52.565 --> 00:52:56.945
I'm actually less than 10% in, so 10% of 900 is 90.

1285
00:52:57.405 --> 00:53:01.105
So if I was going up uniformly, my log

1286
00:53:01.205 --> 00:53:04.545
of one 90 would be 2.1 and

1287
00:53:04.545 --> 00:53:08.065
therefore my log of one 50 would be less than 2.1.

1288
00:53:08.325 --> 00:53:10.585
But my log of one 50 is more than 2.1.

1289
00:53:10.805 --> 00:53:12.505
That's because logs go up differently.

1290
00:53:12.775 --> 00:53:15.265
They go up on sort of an exponential sort of way.

1291
00:53:15.685 --> 00:53:19.305
Um, they go up non uniformly and

1292
00:53:19.305 --> 00:53:23.465
therefore your log values do not exactly match the exact

1293
00:53:23.785 --> 00:53:28.045
position that your value is really important to understand.

1294
00:53:28.045 --> 00:53:29.685
That's why you need to calculate it every time.

1295
00:53:29.685 --> 00:53:32.645
So put in the calculator. Now what if I have a log number

1296
00:53:32.745 --> 00:53:35.165
and I want to find what my actual value was?

1297
00:53:35.165 --> 00:53:36.365
Well, if I'm given a log number,

1298
00:53:36.685 --> 00:53:38.165
I just go 10 to the power of the log.

1299
00:53:38.385 --> 00:53:41.965
So if I was given a log of 1.683, I go 10

1300
00:53:42.305 --> 00:53:43.885
by 1.683

1301
00:53:44.465 --> 00:53:48.965
and being a log of 1.683, I assume it's between 10

1302
00:53:49.025 --> 00:53:51.045
and 100 because I've got this scale here.

1303
00:53:51.045 --> 00:53:52.365
Well, it's 1.683.

1304
00:53:52.385 --> 00:53:54.845
It needs to be between these two pieces of raw data.

1305
00:53:55.385 --> 00:53:57.165
It is, it's 48, it's perfect.

1306
00:53:57.555 --> 00:54:00.725
It's perfectly, uh, it's perfectly being pulled out

1307
00:54:01.105 --> 00:54:05.845
and it's in between our 10 and our 100 because it's 48.

1308
00:54:07.025 --> 00:54:08.325
Always use your calculator.

1309
00:54:08.475 --> 00:54:10.845
This is a really good example to have in your summary book.

1310
00:54:10.995 --> 00:54:12.165
Same with this one back here.

1311
00:54:12.165 --> 00:54:14.125
Having these two in your summary book next to each other

1312
00:54:14.395 --> 00:54:17.085
with then an example of a question such as this.

1313
00:54:17.435 --> 00:54:19.605
This is a VCA 2016 question.

1314
00:54:19.665 --> 00:54:21.765
It was really poorly answered,

1315
00:54:21.765 --> 00:54:23.805
like extremely poorly answered.

1316
00:54:24.695 --> 00:54:25.895
I want you all to have a go at it.

1317
00:54:25.975 --> 00:54:27.655
I know this is your first time seeing logs,

1318
00:54:28.075 --> 00:54:30.175
so if you do struggle with it, that is okay.

1319
00:54:30.205 --> 00:54:31.535
This is your first time seeing logs.

1320
00:54:31.535 --> 00:54:33.415
I don't expect you all to be, um,

1321
00:54:33.825 --> 00:54:35.255
acing this straight off the bat,

1322
00:54:35.715 --> 00:54:37.095
but I do want you to have a go at it.

1323
00:54:37.095 --> 00:54:39.135
So as you can see here, we've got a histogram below

1324
00:54:39.485 --> 00:54:40.975
that shows the distribution of number

1325
00:54:40.975 --> 00:54:42.455
of billionaires per million people

1326
00:54:42.565 --> 00:54:45.375
with the nu the same 53 countries as question six.

1327
00:54:45.375 --> 00:54:46.935
Don't worry about the country, it doesn't really matter

1328
00:54:47.755 --> 00:54:50.305
based on this, the number of countries with one

1329
00:54:50.325 --> 00:54:54.025
or mil more billionaires per million people is, so

1330
00:54:54.925 --> 00:54:56.815
this is the number of billionaires per million

1331
00:54:57.435 --> 00:54:58.935
and it's counted in each country.

1332
00:54:59.035 --> 00:55:02.255
So they counted how many billionaires per million people

1333
00:55:02.345 --> 00:55:04.095
there were in each country,

1334
00:55:04.795 --> 00:55:06.855
and then each country they were like, all right,

1335
00:55:06.855 --> 00:55:09.255
so there is five millionaires per person.

1336
00:55:09.255 --> 00:55:11.375
Alright, we'll do a log scale and we put it in.

1337
00:55:11.795 --> 00:55:14.685
So it says here, the number of countries with,

1338
00:55:14.755 --> 00:55:17.645
with the number of countries with one

1339
00:55:17.785 --> 00:55:21.925
or more billionaires per million people is, so how many

1340
00:55:21.925 --> 00:55:23.165
of these countries, how many

1341
00:55:23.165 --> 00:55:27.945
of these people in this graph here display more than one

1342
00:55:29.335 --> 00:55:31.105
billionaire per million people?

1343
00:55:31.245 --> 00:55:33.665
So more than one. But remember this is on a log

1344
00:55:33.665 --> 00:55:36.245
scale, really important.

1345
00:55:37.385 --> 00:55:41.645
So have we got this question I will give you, um,

1346
00:55:42.715 --> 00:55:44.165
I'll give you some time again.

1347
00:55:44.335 --> 00:55:48.835
Pause it, 3, 2, 1, pause and have feedback.

1348
00:55:48.945 --> 00:55:50.905
Yeah, if you're back

1349
00:55:50.905 --> 00:55:53.385
and you're confused, I will explain a little bit more about

1350
00:55:53.385 --> 00:55:54.665
it, then you'd be welcome to pause it again,

1351
00:55:54.665 --> 00:55:55.665
have another go at it.

1352
00:55:55.885 --> 00:55:57.945
Um, but if not, we'll jump into it.

1353
00:55:58.325 --> 00:56:01.785
So as you can see here, we have a log scale

1354
00:56:02.165 --> 00:56:05.185
and we want to know the raw number of people

1355
00:56:06.255 --> 00:56:09.235
or the raw frequency of more than one.

1356
00:56:09.695 --> 00:56:11.315
So what is a log of one?

1357
00:56:11.385 --> 00:56:12.595
Well, the first thing I would

1358
00:56:12.595 --> 00:56:13.715
do is I'd go to my summary book.

1359
00:56:13.935 --> 00:56:15.595
So I'd go to my summary book and I'd go here.

1360
00:56:16.915 --> 00:56:20.855
So hoping this clicks in, I'd go to here in my summary book

1361
00:56:20.855 --> 00:56:22.055
and I'd go, all right, what is one?

1362
00:56:22.165 --> 00:56:25.365
Well, one in my summary book is 10 to the zero,

1363
00:56:25.535 --> 00:56:28.405
which is a log, a log number of zero.

1364
00:56:28.905 --> 00:56:31.885
So if it's a log number of zero, I need to go to here

1365
00:56:32.545 --> 00:56:36.765
and a log number of zero is here, I draw a line up here just

1366
00:56:36.765 --> 00:56:38.285
to cut off everything to the left.

1367
00:56:38.405 --> 00:56:39.765
I don't care about anything to the left.

1368
00:56:39.885 --> 00:56:42.365
I only want the stuff that is on this right side here.

1369
00:56:43.075 --> 00:56:45.295
So being on this right side, I look at the frequencies

1370
00:56:45.295 --> 00:56:46.975
and I go, all right, well, between zero

1371
00:56:46.995 --> 00:56:50.535
and one means there is one to 10.

1372
00:56:52.255 --> 00:56:54.755
So the number of billionaires per million people is

1373
00:56:54.755 --> 00:56:56.755
somewhere between one and 10 for those countries.

1374
00:56:56.755 --> 00:56:58.115
So there are nine countries there.

1375
00:56:58.275 --> 00:57:00.075
'cause I go and I check it on my frequency

1376
00:57:00.075 --> 00:57:03.355
and my frequency is at nine go across, I get nine.

1377
00:57:03.935 --> 00:57:07.275
So there are nine countries that have more than one

1378
00:57:07.825 --> 00:57:11.875
between one or 10 billionaires per million people.

1379
00:57:12.775 --> 00:57:13.995
And then I go here

1380
00:57:13.995 --> 00:57:17.355
and I say, right, there is one country here that has

1381
00:57:17.355 --> 00:57:21.355
between 10 to a hundred billionaires per million.

1382
00:57:21.415 --> 00:57:22.435
So they could have, you know,

1383
00:57:22.485 --> 00:57:24.715
40 billionaires per million people.

1384
00:57:25.125 --> 00:57:27.275
These, these countries here could have, you know,

1385
00:57:27.505 --> 00:57:30.475
five billionaires per million people.

1386
00:57:31.015 --> 00:57:33.315
So what's really important is that these,

1387
00:57:33.445 --> 00:57:34.875
these are the two you need to add together.

1388
00:57:35.015 --> 00:57:36.635
You get an answer of 10.

1389
00:57:37.375 --> 00:57:39.395
One of the most common answers on this was one.

1390
00:57:40.075 --> 00:57:41.675
'cause people looked at it and went, oh, log,

1391
00:57:41.715 --> 00:57:43.995
I don't really understand it, it's, but it's one's here.

1392
00:57:44.095 --> 00:57:45.715
So there's a frequency of one above it.

1393
00:57:45.735 --> 00:57:47.875
So I'm gonna say, all right, it's, it's one.

1394
00:57:48.895 --> 00:57:51.275
Please, please, please do not answer a question like that.

1395
00:57:51.275 --> 00:57:52.275
Please do not just jump into it

1396
00:57:52.275 --> 00:57:53.675
and go, I, I'm really confused by logs,

1397
00:57:53.675 --> 00:57:54.715
I'm just gonna go with it.

1398
00:57:55.255 --> 00:57:56.835
Try and have a go at this question.

1399
00:57:56.935 --> 00:58:00.595
If you have that, that breakdown that I went to, that slide

1400
00:58:00.595 --> 00:58:01.835
that I went to, you have something like

1401
00:58:01.835 --> 00:58:04.555
that in your summary book with examples such as this.

1402
00:58:06.185 --> 00:58:08.065
I know in this question this was less relevant,

1403
00:58:08.285 --> 00:58:10.065
but having examples like this will make

1404
00:58:10.065 --> 00:58:11.345
this distinctly easier.

1405
00:58:11.925 --> 00:58:13.345
So really important you work through that.

1406
00:58:14.445 --> 00:58:17.205
Now let's move on from logs

1407
00:58:17.205 --> 00:58:18.805
because I don't wanna spend this whole time on logs,

1408
00:58:18.805 --> 00:58:20.165
but logs is one of the more difficult things

1409
00:58:20.165 --> 00:58:22.355
that you will cover pushing through.

1410
00:58:22.845 --> 00:58:24.195
Let's move on to the five figure summary.

1411
00:58:24.225 --> 00:58:26.875
This is the next sort of difficult part of this.

1412
00:58:27.075 --> 00:58:29.595
I think it's, I think you'll be okay with this.

1413
00:58:29.915 --> 00:58:31.195
I think most people cover this pretty well,

1414
00:58:31.195 --> 00:58:32.595
but the first time you see it might be like, oh,

1415
00:58:32.595 --> 00:58:33.595
it's a little bit more confusing.

1416
00:58:34.015 --> 00:58:35.915
So five figure summary.

1417
00:58:36.015 --> 00:58:37.635
The five figure summary is made up of a minimum.

1418
00:58:38.455 --> 00:58:39.875
The quartile one, the median,

1419
00:58:39.875 --> 00:58:41.595
the quartile three and the maximum value.

1420
00:58:41.695 --> 00:58:44.075
Now the best way of looking at this is on box plots,

1421
00:58:44.075 --> 00:58:45.315
which we haven't been through yet.

1422
00:58:45.315 --> 00:58:46.515
And this is why I kept it.

1423
00:58:46.515 --> 00:58:47.795
We're gonna go through box plots here.

1424
00:58:49.195 --> 00:58:50.855
So you can work this down the calculator.

1425
00:58:51.475 --> 00:58:56.035
If you are given all the data in a completely obscure way,

1426
00:58:56.105 --> 00:58:58.795
it's not in order, you can just put

1427
00:58:58.795 --> 00:59:01.235
that all into calculator, maybe given like 30 values in

1428
00:59:01.235 --> 00:59:02.315
a, in a SAC exam.

1429
00:59:02.415 --> 00:59:03.835
And the values are not in order.

1430
00:59:04.025 --> 00:59:06.315
Just go through and put the values into calculator.

1431
00:59:06.315 --> 00:59:08.475
You don't need to order them. You can just put them in in a

1432
00:59:08.475 --> 00:59:09.795
sheet and then you can go through

1433
00:59:09.795 --> 00:59:12.555
and do this all you can go, you can go to your settings,

1434
00:59:12.615 --> 00:59:13.915
you can go to your keyboard and go,

1435
00:59:14.195 --> 00:59:15.995
I want a five figure summary and it'll come out,

1436
00:59:15.995 --> 00:59:17.235
it'll give you a five figure summary.

1437
00:59:17.455 --> 00:59:19.835
You can even make an order it for you so we can order all

1438
00:59:19.835 --> 00:59:21.325
that data, et cetera.

1439
00:59:21.585 --> 00:59:23.205
So cool. But if I've got this data here

1440
00:59:23.205 --> 00:59:24.645
and I say alright, I want my five figure summary.

1441
00:59:24.955 --> 00:59:26.805
Well first of all I'm gonna say, alright,

1442
00:59:26.875 --> 00:59:28.005
well how many numbers do I have?

1443
00:59:28.085 --> 00:59:30.085
I have 14 numbers. That's the most important thing.

1444
00:59:30.145 --> 00:59:32.445
How many numbers do you have? Well I have 14 numbers here.

1445
00:59:32.445 --> 00:59:33.605
I've counted it out. I've got

1446
00:59:33.605 --> 00:59:38.445
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 34 now 14 numbers.

1447
00:59:39.705 --> 00:59:42.005
So the median is going to between be

1448
00:59:42.005 --> 00:59:43.525
between the seventh and the eighth number.

1449
00:59:43.995 --> 00:59:45.125
That means the median is

1450
00:59:45.125 --> 00:59:46.245
gonna be in the middle of those two.

1451
00:59:46.385 --> 00:59:47.765
So I need to add those two together

1452
00:59:47.765 --> 00:59:49.685
and find the, the average of the two

1453
00:59:49.705 --> 00:59:51.005
or the average between two numbers

1454
00:59:51.005 --> 00:59:53.365
that are one apart is just gonna be 0.5 between them.

1455
00:59:53.785 --> 00:59:56.325
So as you can see here I go, I've got seven on the left,

1456
00:59:56.395 --> 00:59:57.565
I've got seven on the right.

1457
00:59:59.625 --> 01:00:01.795
Then I go, all right, what is my first quartile?

1458
01:00:01.795 --> 01:00:03.995
Well my first quartile is going to be the median

1459
01:00:03.995 --> 01:00:05.155
of those first seven numbers.

1460
01:00:05.535 --> 01:00:07.355
And the median of those first seven numbers is

1461
01:00:07.355 --> 01:00:08.435
going to be the fourth number.

1462
01:00:08.755 --> 01:00:11.075
'cause it'll be three on either side. So it'll be eight.

1463
01:00:11.865 --> 01:00:14.315
Same with the the back group.

1464
01:00:14.335 --> 01:00:15.755
So the second 50%,

1465
01:00:16.135 --> 01:00:18.555
I'm gonna find my fourth number in there, which will be 21.

1466
01:00:18.855 --> 01:00:21.995
As you can see there. I now have my Q1, I have my median

1467
01:00:21.995 --> 01:00:24.755
and I have my Q3 and then my min and my max is really easy,

1468
01:00:25.015 --> 01:00:26.555
or my min is gonna be my smallest number

1469
01:00:26.555 --> 01:00:27.915
and my max is gonna be our biggest number.

1470
01:00:28.055 --> 01:00:30.515
So if it's in order, I just go to the go to the end,

1471
01:00:30.635 --> 01:00:31.715
I go to the front, I go to the end.

1472
01:00:32.215 --> 01:00:35.755
Now you can do the same on a top one.

1473
01:00:35.755 --> 01:00:37.315
We already talked about the medium before,

1474
01:00:37.415 --> 01:00:39.915
but you can do the exact same thing on a dot plot.

1475
01:00:40.175 --> 01:00:41.955
You have this information, you can go ahead

1476
01:00:41.975 --> 01:00:44.315
and do it, you can do it on a leaf

1477
01:00:44.315 --> 01:00:45.595
plot, you'll do all the same.

1478
01:00:46.335 --> 01:00:48.795
But the single best way

1479
01:00:48.795 --> 01:00:52.355
to represent a five figure summary is through a box plot.

1480
01:00:53.135 --> 01:00:57.185
So first of all, you need to figure out what your IQR is.

1481
01:00:57.245 --> 01:01:00.105
Now you might be looking at me and saying what's an i qr?

1482
01:01:00.105 --> 01:01:01.745
You've just jumped from box plots to I QRSs.

1483
01:01:01.765 --> 01:01:05.185
Now box plots display IQR really, really well.

1484
01:01:05.285 --> 01:01:08.025
Now I QR stands for interquartile range.

1485
01:01:08.525 --> 01:01:12.145
The interquartile range is the range between your Q3

1486
01:01:12.205 --> 01:01:13.465
and your Q1.

1487
01:01:14.685 --> 01:01:16.385
So essentially

1488
01:01:16.385 --> 01:01:18.705
with this you can also do outlier calculations.

1489
01:01:18.725 --> 01:01:20.625
Now before I jump into outlier calculations,

1490
01:01:20.705 --> 01:01:21.985
I wanna show you what I mean by that.

1491
01:01:22.365 --> 01:01:23.825
So you've got a box plot here.

1492
01:01:23.845 --> 01:01:25.705
Now again, I think my um,

1493
01:01:26.105 --> 01:01:28.305
I think my axes have all gone a bit skewed when I um,

1494
01:01:28.655 --> 01:01:29.665
made these slides bigger.

1495
01:01:29.665 --> 01:01:31.905
And I do apologize that these were not fixed up.

1496
01:01:32.565 --> 01:01:36.665
But as you can see here, I've got my box plot here.

1497
01:01:36.845 --> 01:01:39.145
Now assuming this is spread out, we're going

1498
01:01:39.145 --> 01:01:40.585
to 26 and we've got one.

1499
01:01:41.335 --> 01:01:46.105
What you can see here is that your, your box plot

1500
01:01:46.865 --> 01:01:48.065
represents a couple of things.

1501
01:01:48.615 --> 01:01:52.865
It's represents first of all your, each

1502
01:01:52.865 --> 01:01:54.705
of your 25% ranges.

1503
01:01:54.805 --> 01:01:58.665
So it represents your minimum, your maximum,

1504
01:01:59.575 --> 01:02:04.065
your Q1, your median, and your Q3.

1505
01:02:04.165 --> 01:02:05.705
Now, where does it represent that it represents

1506
01:02:05.705 --> 01:02:06.745
at each of these points here?

1507
01:02:07.905 --> 01:02:12.125
So the, the earliest point either being

1508
01:02:13.505 --> 01:02:16.765
the fence, so the fence, oh I've lost my mouse.

1509
01:02:16.765 --> 01:02:18.685
There it is. This fence here

1510
01:02:19.705 --> 01:02:23.465
or this minimum outlier

1511
01:02:23.485 --> 01:02:25.385
or an outlier will be your minimum point.

1512
01:02:25.385 --> 01:02:26.665
In this case here you've got an outlier.

1513
01:02:26.665 --> 01:02:28.225
So your outlier is your minimum point.

1514
01:02:28.895 --> 01:02:31.665
Your Q1 is this first line of the box.

1515
01:02:32.095 --> 01:02:34.145
Your median is the line in the middle of the box.

1516
01:02:34.295 --> 01:02:37.465
Your Q3 is your line at the end of the box.

1517
01:02:38.245 --> 01:02:40.985
Now your maximum again works the same way as the minimum.

1518
01:02:41.175 --> 01:02:44.305
It's either the fence, so the fence which is here,

1519
01:02:44.645 --> 01:02:46.665
or if you've got an outlier, it's the outlier.

1520
01:02:46.925 --> 01:02:48.825
So the outlier always takes precedence

1521
01:02:48.825 --> 01:02:50.745
because that's higher up or lower down.

1522
01:02:51.125 --> 01:02:53.785
But if there is no outlier, you utilize the fence.

1523
01:02:55.165 --> 01:02:58.425
Now as you can see here, these two are outliers,

1524
01:02:58.645 --> 01:03:01.305
but I think this is the most important point is your fences.

1525
01:03:01.605 --> 01:03:04.145
So we're gonna talk about fences in terms

1526
01:03:04.145 --> 01:03:05.585
of outlier fences in a second,

1527
01:03:06.085 --> 01:03:07.745
but these fences are different.

1528
01:03:09.455 --> 01:03:14.435
You will draw this fence here at the largest and smallest.

1529
01:03:14.815 --> 01:03:16.995
So at this end here, this will be the largest.

1530
01:03:17.495 --> 01:03:20.755
So this is the largest and this down here is the smallest.

1531
01:03:22.055 --> 01:03:26.155
Now the largest non outlier value,

1532
01:03:27.215 --> 01:03:30.835
please do not draw your fence at your outlier calculation,

1533
01:03:30.835 --> 01:03:32.355
which we'll discuss in one second.

1534
01:03:33.105 --> 01:03:35.825
This fence goes

1535
01:03:36.005 --> 01:03:37.545
and the one that I've got highlighted right now

1536
01:03:37.545 --> 01:03:41.465
with my mouse, this goes now at your smallest

1537
01:03:42.485 --> 01:03:44.905
non outlier piece of information.

1538
01:03:45.205 --> 01:03:48.545
Now, if you've got no outliers, this will be the minimum.

1539
01:03:49.245 --> 01:03:51.065
If you have outliers,

1540
01:03:51.525 --> 01:03:54.025
you use your next smallest piece of information.

1541
01:03:54.025 --> 01:03:56.385
So in this case here, I know this scale is wrong,

1542
01:03:56.445 --> 01:03:59.265
so we'll use these the the sort of little points,

1543
01:03:59.845 --> 01:04:03.515
1, 2, 3, 4, 4 right now

1544
01:04:04.095 --> 01:04:08.425
is my smallest non my smallest

1545
01:04:09.725 --> 01:04:11.465
non outlier piece of information.

1546
01:04:12.085 --> 01:04:15.905
One is an outlier. So in this case here, one is an outlier.

1547
01:04:15.905 --> 01:04:17.665
Now this may be very wrong, it's just an example,

1548
01:04:18.205 --> 01:04:19.505
but one is an outlier.

1549
01:04:20.415 --> 01:04:22.925
However, four

1550
01:04:24.485 --> 01:04:25.585
is not an outlier.

1551
01:04:26.165 --> 01:04:29.185
Now my fence, my outlier, you know,

1552
01:04:29.195 --> 01:04:30.385
fence might have been at two.

1553
01:04:30.725 --> 01:04:31.905
So I might've done a calculation

1554
01:04:31.905 --> 01:04:33.625
and found that my outlier fence is two.

1555
01:04:34.055 --> 01:04:38.065
Therefore my outlier one is, well my value

1556
01:04:38.165 --> 01:04:40.625
of one was an outlier because it's less than two,

1557
01:04:41.445 --> 01:04:43.465
but I'm not drawing my fence at two,

1558
01:04:43.765 --> 01:04:46.945
I'm drawing my box fence at four

1559
01:04:46.945 --> 01:04:48.945
because that is my smallest value.

1560
01:04:48.945 --> 01:04:51.665
That is not an outlier. Really important.

1561
01:04:51.885 --> 01:04:53.225
At the other end, it's the same thing.

1562
01:04:53.405 --> 01:04:54.865
So let's say this goes to 26.

1563
01:04:55.205 --> 01:04:56.705
So I have an outlier at 26

1564
01:04:56.805 --> 01:05:00.465
and maybe my fence was at 24 maybe I did my outlier

1565
01:05:00.465 --> 01:05:03.345
calculation and I found that at anything above 24

1566
01:05:03.365 --> 01:05:05.865
as an outlier, therefore 26 was an outlier.

1567
01:05:06.995 --> 01:05:10.695
But then my next biggest value was 23.

1568
01:05:10.955 --> 01:05:13.255
So I had a 23 and then I had a 26.

1569
01:05:13.445 --> 01:05:16.095
Well 23 therefore is gonna be my fence end

1570
01:05:16.095 --> 01:05:19.605
because that's my largest value outside

1571
01:05:20.105 --> 01:05:21.645
of my outliers.

1572
01:05:22.385 --> 01:05:24.485
Now how do I do this outlier calculation?

1573
01:05:24.545 --> 01:05:26.885
The reason why this box pod is actually like slightly wrong

1574
01:05:27.025 --> 01:05:30.365
is because even with the access wrong, neither, none

1575
01:05:30.365 --> 01:05:31.445
of those values would not being outies.

1576
01:05:31.875 --> 01:05:33.645
Your outlier calculation looks like this.

1577
01:05:33.995 --> 01:05:37.005
Your lower fence, your lower outlier fence,

1578
01:05:37.505 --> 01:05:38.685
not the fence that you draw.

1579
01:05:38.835 --> 01:05:42.005
Your outlier fence is Q1,

1580
01:05:42.915 --> 01:05:45.725
take away 1.5 by iqr

1581
01:05:46.225 --> 01:05:49.125
and your upper outlier fence is Q3

1582
01:05:50.035 --> 01:05:52.765
plus 1.5 times I qr.

1583
01:05:53.965 --> 01:05:55.975
That is how you determine outliers.

1584
01:05:55.975 --> 01:05:58.375
This is a formula you need to have in your summary book

1585
01:05:58.955 --> 01:06:01.215
and it's really important you utilize it

1586
01:06:01.415 --> 01:06:02.735
whenever you draw out a box plot.

1587
01:06:03.715 --> 01:06:05.495
Now the other useful thing

1588
01:06:05.495 --> 01:06:07.975
with box plots is it represents your 25, 25.

1589
01:06:08.635 --> 01:06:10.695
So given that this is your minimum

1590
01:06:10.835 --> 01:06:14.415
and this is your Q1, you know that 25% of the data is

1591
01:06:14.415 --> 01:06:15.735
between that point there.

1592
01:06:16.225 --> 01:06:17.335
Given this is Q1

1593
01:06:17.335 --> 01:06:20.415
and medium, you know there's another 25% in here given this

1594
01:06:20.415 --> 01:06:21.735
is the median and the Q3,

1595
01:06:21.795 --> 01:06:23.735
you know there's another 25% in here.

1596
01:06:23.915 --> 01:06:26.135
So then you can say, well there's less than 75%

1597
01:06:26.135 --> 01:06:28.495
of information below whatever value Q3 is,

1598
01:06:28.795 --> 01:06:32.615
or there is 75% of information above five

1599
01:06:32.615 --> 01:06:33.855
because we know this is five.

1600
01:06:33.915 --> 01:06:38.895
So there was 1, 2, 3, 4, 5. So this line is at five.

1601
01:06:39.235 --> 01:06:42.015
So I know 75% of my data in this

1602
01:06:42.015 --> 01:06:45.095
that this box plot represents is above five.

1603
01:06:46.705 --> 01:06:49.765
Um, and then between Q3 and max there's another 25%.

1604
01:06:50.025 --> 01:06:51.725
So there's your a hundred percent if you add all those

1605
01:06:51.725 --> 01:06:52.805
25% up.

1606
01:06:53.025 --> 01:06:55.085
So that's your usefulness of a box bot.

1607
01:06:55.735 --> 01:06:56.835
Now the other thing

1608
01:06:56.835 --> 01:06:59.315
with box bots is they are commonly utilized

1609
01:06:59.615 --> 01:07:01.235
to discuss information

1610
01:07:01.695 --> 01:07:04.075
and you are commonly asked to describe,

1611
01:07:04.345 --> 01:07:06.035
this is the most common uni area piece

1612
01:07:06.035 --> 01:07:07.035
of information you be asked to describe.

1613
01:07:07.535 --> 01:07:09.435
You'll be asked to describe box spots in terms

1614
01:07:09.435 --> 01:07:11.555
of their shape center, spreaded, outies.

1615
01:07:11.935 --> 01:07:16.205
So out lies, are they present? If so, what are they?

1616
01:07:16.275 --> 01:07:19.725
Also note if there are no outliers center note, the mean

1617
01:07:19.785 --> 01:07:21.565
or the median, you can see that,

1618
01:07:21.835 --> 01:07:23.245
that you can see the median.

1619
01:07:23.305 --> 01:07:24.445
The median's usually the easiest.

1620
01:07:24.645 --> 01:07:26.165
I usually always go to the median

1621
01:07:26.195 --> 01:07:28.005
because you can see the median in a

1622
01:07:28.005 --> 01:07:29.125
box plot, it's the middle line.

1623
01:07:30.225 --> 01:07:34.925
You then have your mode as well. So your mode is less easy.

1624
01:07:34.985 --> 01:07:36.045
You need to have your raw data.

1625
01:07:36.115 --> 01:07:37.765
Same with the mean, you don't really have

1626
01:07:37.765 --> 01:07:39.445
to comment on them if you don't have the raw data,

1627
01:07:39.445 --> 01:07:41.245
we don't have the ability, usually the median is

1628
01:07:41.525 --> 01:07:43.165
suffice, the spread.

1629
01:07:43.585 --> 01:07:45.125
So what is the range

1630
01:07:45.125 --> 01:07:47.405
of the data from your minimum to your maximum?

1631
01:07:48.065 --> 01:07:49.285
You don't wanna talk in,

1632
01:07:49.285 --> 01:07:50.925
you wanna include your outliers in that.

1633
01:07:50.925 --> 01:07:52.405
So you wanna go your minimum to your maximum.

1634
01:07:52.425 --> 01:07:54.245
You might also wanna talk about your I QR

1635
01:07:54.545 --> 01:07:56.285
or if you've been given a standard deviation,

1636
01:07:56.285 --> 01:07:57.445
we'll talk about that in a second.

1637
01:07:58.225 --> 01:08:02.525
Um, and then you also want to talk about, um, the shape.

1638
01:08:02.665 --> 01:08:04.365
Now the shape is a little bit more confusing.

1639
01:08:04.785 --> 01:08:06.285
The shape looks like this.

1640
01:08:06.665 --> 01:08:09.815
So if your, there's, I'll show you some other ones,

1641
01:08:09.815 --> 01:08:10.815
but this is a histogram.

1642
01:08:10.995 --> 01:08:13.855
If your histogram looks like this one over here on your

1643
01:08:13.855 --> 01:08:16.125
left, it means, means it's positively skewed.

1644
01:08:16.125 --> 01:08:18.005
It means it tails off into the positive.

1645
01:08:19.425 --> 01:08:20.835
It's negatively skewed if it's,

1646
01:08:21.015 --> 01:08:22.435
if it tails off into the negative.

1647
01:08:22.435 --> 01:08:24.875
Now it doesn't actually need to go into negative values,

1648
01:08:25.415 --> 01:08:27.915
it just can tail off towards the smaller values.

1649
01:08:28.055 --> 01:08:29.955
We call this negatively skewed,

1650
01:08:30.025 --> 01:08:31.675
approximately symmetrical is the other one.

1651
01:08:31.985 --> 01:08:33.555
Bimodal has never come up.

1652
01:08:33.895 --> 01:08:36.515
It was, it was put in their study design a couple years ago

1653
01:08:36.515 --> 01:08:38.755
and they removed it, but they never distinctly

1654
01:08:38.755 --> 01:08:39.795
said, you don't need to know it.

1655
01:08:39.815 --> 01:08:41.195
So we always sort of teach it even

1656
01:08:41.195 --> 01:08:42.355
though it's never really come up.

1657
01:08:43.775 --> 01:08:46.035
Um, so as you can see there, they look a bit like that.

1658
01:08:46.665 --> 01:08:48.955
Then we have our box box, sorry, come back.

1659
01:08:49.185 --> 01:08:50.195
Then we have our box spots.

1660
01:08:50.255 --> 01:08:53.155
As you can see here, this is a positively skewed box bot.

1661
01:08:53.155 --> 01:08:55.395
Sorry, these ones have the correct uh, axes.

1662
01:08:55.395 --> 01:08:57.675
This is what the number and the axis wanted to look like.

1663
01:08:57.735 --> 01:08:59.315
Do apologize for me getting a bit squished.

1664
01:08:59.655 --> 01:09:00.875
Um, as you can see here,

1665
01:09:01.015 --> 01:09:02.675
we have a positively skewed box spot

1666
01:09:02.775 --> 01:09:04.035
as you can see on the next one.

1667
01:09:04.035 --> 01:09:05.555
You've got a negatively skewed box spot

1668
01:09:05.975 --> 01:09:07.515
and as I was trying to say, it never,

1669
01:09:07.545 --> 01:09:09.315
this box bot never goes into the negatives.

1670
01:09:09.465 --> 01:09:10.435
This doesn't actually, you know,

1671
01:09:10.845 --> 01:09:12.155
touch the negatives over here.

1672
01:09:12.155 --> 01:09:13.835
There might be negative values back here,

1673
01:09:14.135 --> 01:09:15.195
but it never touches it.

1674
01:09:15.255 --> 01:09:16.635
But it's still negatively skewed

1675
01:09:16.635 --> 01:09:20.595
because it's skewed towards the more, um,

1676
01:09:21.415 --> 01:09:22.755
the more smaller values.

1677
01:09:23.375 --> 01:09:24.315
And then as you can see here,

1678
01:09:24.315 --> 01:09:25.315
we've got approximately symmetrical.

1679
01:09:25.395 --> 01:09:26.955
'cause this box plot is pretty symmetrical.

1680
01:09:26.955 --> 01:09:28.435
There's not a lot of difference between them.

1681
01:09:28.735 --> 01:09:30.035
So that is essentially

1682
01:09:30.295 --> 01:09:32.075
how you describe the shape of box bots.

1683
01:09:33.625 --> 01:09:35.965
Um, and then lastly with numerical data, you need

1684
01:09:35.965 --> 01:09:37.365
to know normal distribution.

1685
01:09:37.755 --> 01:09:38.845
I'll touch on this quite briefly,

1686
01:09:38.945 --> 01:09:40.685
but normal distribution is essentially

1687
01:09:40.685 --> 01:09:43.405
where your mean equals your median, equals your mode.

1688
01:09:43.435 --> 01:09:45.405
It's used for numerical continuous data.

1689
01:09:46.345 --> 01:09:49.245
Um, and you have 50% of your data either side of it.

1690
01:09:49.245 --> 01:09:52.165
So it's like a perfect symmetrical piece of data.

1691
01:09:52.865 --> 01:09:55.365
Um, and it approaches zero on both ends.

1692
01:09:55.585 --> 01:09:57.605
Now normal distribution's really cool

1693
01:09:57.605 --> 01:09:59.805
because it's utilized for your ATARs.

1694
01:09:59.805 --> 01:10:03.445
So we normally distribute around a mean median mode of 30

1695
01:10:04.105 --> 01:10:06.885
and then we sort of distribute from there and then scaling.

1696
01:10:07.035 --> 01:10:08.565
Very different to that. Don't worry about scaling.

1697
01:10:09.945 --> 01:10:12.365
Um, but the best way to describe the spread of data is

1698
01:10:12.365 --> 01:10:14.725
through one, you mean, and then your standard deviation.

1699
01:10:15.385 --> 01:10:17.005
So your standard deviation is just a value

1700
01:10:17.075 --> 01:10:18.485
that is calculated in your calculator

1701
01:10:18.485 --> 01:10:19.765
and you don't need to worry how to do that.

1702
01:10:19.765 --> 01:10:21.125
You just have to do it in your calculator.

1703
01:10:21.185 --> 01:10:23.125
And you do that through your sheets most

1704
01:10:23.125 --> 01:10:24.525
of the time will actually just be given to you.

1705
01:10:25.165 --> 01:10:27.705
But in a question you're told, all right, my mean is five,

1706
01:10:27.705 --> 01:10:28.905
my standard deviation is two.

1707
01:10:29.255 --> 01:10:31.905
What would it mean? What value would I be if I was one

1708
01:10:32.065 --> 01:10:34.385
standard deviation above the mean or you go five plus two

1709
01:10:34.525 --> 01:10:36.385
or if I was two standard deviations above

1710
01:10:36.445 --> 01:10:39.505
or you go five plus two plus two, what if I was one

1711
01:10:39.505 --> 01:10:42.465
below five, take two, et cetera.

1712
01:10:42.765 --> 01:10:43.785
And the reason we do that is

1713
01:10:43.785 --> 01:10:47.825
because then we can sort of say what percentage of you know,

1714
01:10:47.925 --> 01:10:50.945
if it was a mark, maybe it's a mark from a, from an exam

1715
01:10:51.085 --> 01:10:54.265
or a a sap, what percentage of your class are you ahead of?

1716
01:10:54.725 --> 01:10:56.385
If I do a mean and a standard deviation

1717
01:10:56.385 --> 01:10:58.425
of my classes scores, um,

1718
01:10:58.605 --> 01:11:01.745
and let's say it's normally distributed, um,

1719
01:11:02.085 --> 01:11:04.265
we normally distribute it, we can say, all right,

1720
01:11:04.575 --> 01:11:06.345
what percentage of my class am I above?

1721
01:11:06.455 --> 01:11:09.585
Well, if I'm one standard deviation above, I'm above,

1722
01:11:09.685 --> 01:11:14.165
you know, 50%, so 50% on this side plus 34, I'm

1723
01:11:14.165 --> 01:11:15.805
above 84% of my class.

1724
01:11:16.665 --> 01:11:19.165
Um, if I am two standard deviations above,

1725
01:11:19.165 --> 01:11:21.925
well 84 plus this 13.5,

1726
01:11:22.515 --> 01:11:26.525
well I'm now 97.5% of my class is

1727
01:11:26.525 --> 01:11:28.365
below me, et cetera.

1728
01:11:28.665 --> 01:11:30.445
Um, and what's really important is you need

1729
01:11:30.445 --> 01:11:32.325
to know these three lines

1730
01:11:32.545 --> 01:11:36.045
and this exact diagram, this exact diagram

1731
01:11:36.675 --> 01:11:38.685
must be in your summary book full stop.

1732
01:11:38.875 --> 01:11:40.485
This whole page needs to be in your summary

1733
01:11:40.545 --> 01:11:43.045
and I want you to screenshot this exact page

1734
01:11:43.045 --> 01:11:45.125
and put in, I want you to have this exact diagram,

1735
01:11:45.135 --> 01:11:46.285
screenshot it in, and then I

1736
01:11:46.285 --> 01:11:47.365
want you to write out the other parts.

1737
01:11:47.865 --> 01:11:50.965
Really important. You must have this information in your

1738
01:11:50.965 --> 01:11:52.445
summary book found reference.

1739
01:11:53.035 --> 01:11:54.925
Doesn't matter if you don't want it

1740
01:11:54.925 --> 01:11:55.965
there, it needs to be there.

1741
01:11:56.705 --> 01:11:58.045
Um, and why it needs to be there is

1742
01:11:58.045 --> 01:11:59.165
because you get asked questions like this.

1743
01:11:59.925 --> 01:12:02.965
A class of 24 students receives their science test results

1744
01:12:03.115 --> 01:12:05.485
with a mean of 32 and a standard deviation of two.

1745
01:12:06.565 --> 01:12:10.265
How many students received a mark between 28 and 32?

1746
01:12:10.615 --> 01:12:12.705
Well, how do I know how to answer this?

1747
01:12:12.705 --> 01:12:16.505
Well, I go back to this diagram here. What did it say?

1748
01:12:16.505 --> 01:12:18.225
It said I had a standard a mean of 32

1749
01:12:18.225 --> 01:12:19.385
and a standard deviation of two.

1750
01:12:19.575 --> 01:12:22.105
Well, a standard deviation one below is 30

1751
01:12:22.125 --> 01:12:24.065
and a standard deviation two below is 28.

1752
01:12:24.445 --> 01:12:27.705
So two standard deviations below is 28.

1753
01:12:27.765 --> 01:12:31.065
So I'm gonna be somewhere over here in between my purple

1754
01:12:31.165 --> 01:12:34.185
and my my blue and then ones.

1755
01:12:34.205 --> 01:12:37.105
And then on the meeting on the mean was the other value is

1756
01:12:37.105 --> 01:12:38.145
gonna be just over here.

1757
01:12:38.565 --> 01:12:40.585
So what I need to do is add these two together.

1758
01:12:40.685 --> 01:12:43.985
So I need to say, all right, I have 34 plus 13.5.

1759
01:12:44.645 --> 01:12:48.545
So what I'm saying is there is 47.5% of my class

1760
01:12:48.545 --> 01:12:51.585
between those values and I was told there was 24 students

1761
01:12:51.965 --> 01:12:55.305
or that I need to find 47.5% of those students

1762
01:12:55.605 --> 01:12:57.745
and round to the nearest student

1763
01:12:58.055 --> 01:13:02.065
because I'm not going to say I have half a student,

1764
01:13:02.575 --> 01:13:03.905
half a student doesn't exist.

1765
01:13:04.065 --> 01:13:05.145
I need a whole number.

1766
01:13:06.205 --> 01:13:10.705
So as you can see here, 32 take 28 was four, four divided

1767
01:13:10.725 --> 01:13:11.865
by two was two.

1768
01:13:12.245 --> 01:13:14.145
So I have two standard easy, um,

1769
01:13:14.165 --> 01:13:15.785
two standard deviations under the mean.

1770
01:13:15.885 --> 01:13:20.825
So I need to add those two little areas. 34 plus 13.5, 47.5.

1771
01:13:21.445 --> 01:13:24.745
So then I multiply that percentage as a decimal.

1772
01:13:25.325 --> 01:13:27.985
By 24 I get 11.4,

1773
01:13:28.665 --> 01:13:30.325
cannot have 11.4 students

1774
01:13:30.395 --> 01:13:33.165
because it's less than 0.5 I round down.

1775
01:13:33.385 --> 01:13:35.565
So I've rounded down to 11 students.

1776
01:13:36.225 --> 01:13:37.405
Really important that you do that.

1777
01:13:37.405 --> 01:13:40.645
Please make sure you don't round, make sure you don't, um,

1778
01:13:41.215 --> 01:13:43.285
leave your answer as 11.4 students.

1779
01:13:44.065 --> 01:13:46.525
Um, there's not 11.4 students.

1780
01:13:46.665 --> 01:13:48.445
You need to use common sense in these questions

1781
01:13:48.745 --> 01:13:50.405
and say, all right, it's not a point.

1782
01:13:52.065 --> 01:13:53.485
Um, and then we have our Zed scores.

1783
01:13:53.785 --> 01:13:56.045
So our Zed scores are essentially a little bit different.

1784
01:13:56.045 --> 01:13:57.165
So we've done that first question,

1785
01:13:57.345 --> 01:13:59.565
but then it says Ben achieved a result of 35.

1786
01:13:59.755 --> 01:14:01.405
What is his standardized score?

1787
01:14:01.595 --> 01:14:04.605
Well, a standardized score as you can see here,

1788
01:14:05.035 --> 01:14:06.845
well actually this sec, this first question we haven't done.

1789
01:14:06.845 --> 01:14:08.245
This is between 28 and 34.

1790
01:14:08.245 --> 01:14:09.405
You're welcome to do that yourself.

1791
01:14:09.745 --> 01:14:11.925
So, but your Z score is a little bit different.

1792
01:14:12.285 --> 01:14:14.645
Achieved a result of 35. What is your standardized score?

1793
01:14:14.675 --> 01:14:18.005
Well, a standardized score is essentially your Z score.

1794
01:14:18.185 --> 01:14:20.645
So you would put it in this formula here.

1795
01:14:20.865 --> 01:14:24.685
So your actual score was 35, so my X would be 35.

1796
01:14:25.105 --> 01:14:27.925
So I'd be putting X equals 35 in my

1797
01:14:27.925 --> 01:14:29.765
calculator, so that's 35.

1798
01:14:30.355 --> 01:14:33.845
Then I would go to my X bar. Now X bar is your mean.

1799
01:14:33.945 --> 01:14:36.165
So I do 35, take 32.

1800
01:14:36.425 --> 01:14:37.805
So three would go on top

1801
01:14:38.515 --> 01:14:40.175
and then I get my standard deviation.

1802
01:14:40.175 --> 01:14:43.255
Well my standard deviation is two, so I put two underneath.

1803
01:14:43.255 --> 01:14:45.255
So I put three over two in my calculator.

1804
01:14:45.555 --> 01:14:47.015
I'm getting an answer of 1.5.

1805
01:14:49.155 --> 01:14:52.095
Um, so as you can see here,

1806
01:14:53.125 --> 01:14:55.495
then you also have this first question which is an extension

1807
01:14:55.495 --> 01:14:57.735
on that of that uh, question we did before.

1808
01:14:58.035 --> 01:14:59.935
I'm gonna give you your time. I want you to have a go at

1809
01:14:59.935 --> 01:15:01.055
that first question if you want.

1810
01:15:01.145 --> 01:15:03.015
We're not gonna go through it, but you're welcome to pause

1811
01:15:03.015 --> 01:15:04.975
and have a go you back.

1812
01:15:05.355 --> 01:15:06.575
But as you can see here, the answer

1813
01:15:06.575 --> 01:15:08.015
to the first question is 81.5.

1814
01:15:08.395 --> 01:15:10.455
And the answer to the second question is 1.5.

1815
01:15:10.755 --> 01:15:14.735
So that is what a Zed score is. So that is univa data.

1816
01:15:14.795 --> 01:15:16.615
We probably spent a little bit more time on it than I wanted

1817
01:15:16.615 --> 01:15:18.415
to, but the reality is it's

1818
01:15:18.415 --> 01:15:19.455
the first thing you're gonna go through.

1819
01:15:19.675 --> 01:15:23.615
So make sure that you understand how all of those aspects

1820
01:15:23.615 --> 01:15:25.015
of univa data work.

1821
01:15:25.315 --> 01:15:30.075
Now in our last sort of 40, 45 minutes,

1822
01:15:30.935 --> 01:15:32.635
we are going to smash through

1823
01:15:33.335 --> 01:15:34.995
the first half of five variate.

1824
01:15:35.015 --> 01:15:37.435
So we're up to slide 86.

1825
01:15:37.435 --> 01:15:42.395
There's 124 slides I believe. Um, I did update it.

1826
01:15:42.395 --> 01:15:43.555
So I think there's 124.

1827
01:15:44.175 --> 01:15:47.035
Um, but what you will find is that in the slides

1828
01:15:47.035 --> 01:15:48.795
that are available in the week that this is,

1829
01:15:48.905 --> 01:15:50.595
this is out there, um, all those

1830
01:15:50.595 --> 01:15:51.955
who are watching on a x plus,

1831
01:15:51.955 --> 01:15:53.075
they'll still be available there.

1832
01:15:53.575 --> 01:15:55.195
Um, you'll see that the slides go

1833
01:15:55.195 --> 01:15:57.915
to about 150 something in the PDF.

1834
01:15:58.335 --> 01:16:00.555
That's because the last 30 slides I've left

1835
01:16:00.565 --> 01:16:03.875
after our final slide are covering our seasonal.

1836
01:16:04.375 --> 01:16:06.515
Now we will cover that in the autumn lecture series,

1837
01:16:06.625 --> 01:16:09.415
however I wanted

1838
01:16:09.415 --> 01:16:11.175
to put it there in case people have their sack

1839
01:16:11.175 --> 01:16:12.495
before the autumn lecture series

1840
01:16:12.555 --> 01:16:13.855
and people wanted to go through it.

1841
01:16:14.115 --> 01:16:15.495
You're welcome to go through it yourself.

1842
01:16:15.595 --> 01:16:16.695
I'm not gonna discuss it with you

1843
01:16:16.775 --> 01:16:18.655
'cause I'm not gonna do it in this chat

1844
01:16:19.915 --> 01:16:21.375
or in this, this sort of lecture.

1845
01:16:21.955 --> 01:16:23.415
But the slides are there.

1846
01:16:23.415 --> 01:16:24.575
So you're welcome to look at the

1847
01:16:24.575 --> 01:16:25.775
slides and have a read through them.

1848
01:16:25.775 --> 01:16:29.015
They're pretty sort of detailed in terms

1849
01:16:29.015 --> 01:16:30.855
of they discuss exactly what you need to know.

1850
01:16:31.275 --> 01:16:33.215
Um, so in that sense they're pretty good.

1851
01:16:33.315 --> 01:16:36.215
You don't need to um, worry about trying

1852
01:16:36.215 --> 01:16:37.615
to decipher what's going on in them.

1853
01:16:37.845 --> 01:16:39.135
They're pretty good for that.

1854
01:16:40.595 --> 01:16:45.325
Alright, so Univar data is great at telling us what, so

1855
01:16:45.355 --> 01:16:47.125
what is the average height of people in this room?

1856
01:16:47.315 --> 01:16:49.365
What is the most popular color, et cetera.

1857
01:16:49.425 --> 01:16:52.965
So that's what Univar data is used for by vary data is great

1858
01:16:52.965 --> 01:16:55.245
for comparing and answering why.

1859
01:16:56.065 --> 01:16:58.805
So what is the relationship between age and high?

1860
01:16:59.115 --> 01:17:00.965
Does gender playing well in someone's favorite color?

1861
01:17:01.305 --> 01:17:03.645
How do you average temperatures in all major cities?

1862
01:17:03.645 --> 01:17:05.885
Compare. So you are comparing two things.

1863
01:17:06.865 --> 01:17:11.355
So by area data, when we've got more than one variable,

1864
01:17:11.415 --> 01:17:12.635
we give the variables different names.

1865
01:17:12.895 --> 01:17:14.955
So science kids will know this is the independent,

1866
01:17:14.955 --> 01:17:17.315
independent for general mass,

1867
01:17:17.315 --> 01:17:19.195
we call this explanatory and response.

1868
01:17:19.425 --> 01:17:21.515
They are the exact same thing. We just use different names.

1869
01:17:22.475 --> 01:17:24.605
Explanatory is known as the independent variable

1870
01:17:24.705 --> 01:17:27.645
or in in general we are gonna call it

1871
01:17:28.115 --> 01:17:29.725
explanatory variable ev.

1872
01:17:30.435 --> 01:17:32.085
This is plotted on the X axis.

1873
01:17:32.085 --> 01:17:34.525
The explanatory variable explains what's going on.

1874
01:17:35.875 --> 01:17:39.575
So this therefore causes a change in the response variable.

1875
01:17:39.755 --> 01:17:43.175
So a good way of thinking about this is height versus age

1876
01:17:43.755 --> 01:17:46.175
or age is going to be your explanatory variable.

1877
01:17:46.285 --> 01:17:48.135
Your height does not explain your age,

1878
01:17:48.275 --> 01:17:50.535
but your age does explain your height.

1879
01:17:50.965 --> 01:17:52.535
Another example is shoe size.

1880
01:17:52.885 --> 01:17:54.895
Your shoe size does not explain your age,

1881
01:17:55.115 --> 01:17:57.455
but your age does explain your shoe size.

1882
01:17:57.715 --> 01:18:00.655
As you grow up, your shoe size gets bigger, it's not

1883
01:18:00.655 --> 01:18:01.935
as your shoe size gets bigger,

1884
01:18:02.195 --> 01:18:04.215
you are growing up it may not be that.

1885
01:18:04.795 --> 01:18:08.335
Um, so it's really important to understand that your age is

1886
01:18:08.405 --> 01:18:09.975
what changes that.

1887
01:18:10.395 --> 01:18:14.095
So your response variable, which is your RV

1888
01:18:14.115 --> 01:18:16.055
or known in science is your dependent variable.

1889
01:18:16.075 --> 01:18:19.095
But we are going to use rv, not dv,

1890
01:18:19.105 --> 01:18:20.735
we're gonna use RV in in general.

1891
01:18:20.785 --> 01:18:24.135
Maths is a variable you think will be changed as a response.

1892
01:18:25.355 --> 01:18:29.135
So age to shoe size, it's not shoe size to age.

1893
01:18:29.875 --> 01:18:32.895
So this is another really good uh, summary

1894
01:18:33.235 --> 01:18:36.495
of the different type of graphs you need to notice.

1895
01:18:36.495 --> 01:18:37.735
This is a great little summary.

1896
01:18:38.355 --> 01:18:42.655
It shows you each of the sort of

1897
01:18:43.385 --> 01:18:47.575
major types of graphs and what the explanatory variable is

1898
01:18:47.835 --> 01:18:49.775
and what the response variable is.

1899
01:18:50.355 --> 01:18:53.575
So you've got a explanatory variable

1900
01:18:54.195 --> 01:18:57.255
of categorical versus response, variable of categorical

1901
01:18:57.275 --> 01:18:59.535
for segmented bar charts or two-way frequency tables.

1902
01:18:59.555 --> 01:19:01.935
And then we move into things like parallel box plots which

1903
01:19:01.935 --> 01:19:03.055
use the numerical stem

1904
01:19:03.055 --> 01:19:05.535
and leaf plots back to back which use a bit of numerical.

1905
01:19:05.835 --> 01:19:09.335
And then we use scatterplots, which are numerical

1906
01:19:09.335 --> 01:19:11.655
and numerical scatterplots are easily the most useful

1907
01:19:11.675 --> 01:19:14.695
and are what you're gonna spend the most time on in further,

1908
01:19:15.195 --> 01:19:17.335
um, they are going to be the most challenging thing.

1909
01:19:17.875 --> 01:19:20.895
Um, but you are gonna get pretty used to them

1910
01:19:21.035 --> 01:19:22.775
and you are gonna sort of end up,

1911
01:19:23.045 --> 01:19:25.375
most people end up enjoying them because they are difficult,

1912
01:19:25.375 --> 01:19:27.255
but they do sort of make sense in the end.

1913
01:19:28.225 --> 01:19:31.285
So we won't spend a lot of time on these graphs.

1914
01:19:31.285 --> 01:19:32.725
We'll spend a lot of time on scatter plot.

1915
01:19:32.725 --> 01:19:33.725
So we'll probably over the just,

1916
01:19:33.775 --> 01:19:34.805
it'll probably take us 10 minutes

1917
01:19:35.145 --> 01:19:37.645
and we're gonna blast through the next sort of 10

1918
01:19:37.745 --> 01:19:38.805
to 15 slides.

1919
01:19:38.905 --> 01:19:39.925
And then the last 10

1920
01:19:39.925 --> 01:19:43.245
to 15 slides will take us a good 25 minutes to go through.

1921
01:19:43.425 --> 01:19:47.475
So as you can see here,

1922
01:19:47.735 --> 01:19:50.555
we have um, segmented bar charts.

1923
01:19:50.575 --> 01:19:52.515
So segmented bar charts are essentially

1924
01:19:52.515 --> 01:19:55.155
where we compare categorical data with categorical data.

1925
01:19:55.215 --> 01:19:57.235
Now because they are segmenting it by variant,

1926
01:19:57.375 --> 01:19:58.435
we use different colors.

1927
01:19:58.575 --> 01:20:00.995
You know how before I said in our bar charts

1928
01:20:00.995 --> 01:20:03.035
for our univar categorical battle, I want you

1929
01:20:03.035 --> 01:20:04.755
to use one color in this case

1930
01:20:04.755 --> 01:20:05.955
here I want you to use more than one.

1931
01:20:06.775 --> 01:20:08.585
As you can see, I have the year.

1932
01:20:09.045 --> 01:20:11.545
So the year is a categorical ordinal

1933
01:20:11.545 --> 01:20:13.545
because it has an order and it's categorical.

1934
01:20:13.765 --> 01:20:15.705
And then I have cold mile hot once again,

1935
01:20:15.705 --> 01:20:18.305
categorical ordinal because we have an order.

1936
01:20:18.445 --> 01:20:19.705
Now the year is a number, yes,

1937
01:20:19.725 --> 01:20:22.875
but it describes a time period.

1938
01:20:23.985 --> 01:20:26.635
Cold mile hot describes a temperature.

1939
01:20:27.335 --> 01:20:31.635
So the number of days at a temperature level.

1940
01:20:31.775 --> 01:20:32.915
Now we've got a frequency.

1941
01:20:32.975 --> 01:20:37.045
Now really important segmented bar charts usually require,

1942
01:20:37.145 --> 01:20:38.525
are required to be in a frequency.

1943
01:20:38.665 --> 01:20:40.525
Yes, you can a percentage frequency, sorry,

1944
01:20:40.545 --> 01:20:43.205
you can do it in just a general frequency

1945
01:20:43.505 --> 01:20:44.925
and they'll all be different heights.

1946
01:20:45.225 --> 01:20:47.485
But the best way to do it is to do a percentage frequency.

1947
01:20:47.645 --> 01:20:49.965
'cause then you can directly compare each category.

1948
01:20:51.115 --> 01:20:53.815
Now the next point is they need to be in the same order.

1949
01:20:54.315 --> 01:20:56.935
So you need to have hot at the bottom, mild in the middle,

1950
01:20:56.995 --> 01:20:58.775
and cold at the top for all three of them.

1951
01:20:58.955 --> 01:21:01.295
Now if you wanted to do cold, then hot,

1952
01:21:01.295 --> 01:21:02.535
then mild doesn't matter.

1953
01:21:03.395 --> 01:21:07.475
It just needs to be each of the three of them needs

1954
01:21:07.475 --> 01:21:09.315
to have them in the same order.

1955
01:21:09.375 --> 01:21:10.675
If they're not in the same order,

1956
01:21:10.825 --> 01:21:13.275
then it's just wrong, throw it out the door.

1957
01:21:14.375 --> 01:21:18.275
The other thing is it needs to have color coding for each

1958
01:21:18.335 --> 01:21:20.835
of the different categories that are in the bar.

1959
01:21:21.455 --> 01:21:25.475
So as you can see here, this is your explanatory variable.

1960
01:21:26.225 --> 01:21:27.595
This is your response variable.

1961
01:21:27.595 --> 01:21:28.995
Your response variables need

1962
01:21:28.995 --> 01:21:30.995
to have different colors accordingly.

1963
01:21:31.455 --> 01:21:34.595
So cold here, they've given blue, mildly given green,

1964
01:21:34.695 --> 01:21:35.915
red hot doesn't really matter.

1965
01:21:35.915 --> 01:21:37.515
That colors sort of match up with it.

1966
01:21:37.815 --> 01:21:39.715
But nonetheless the colors need to be the same.

1967
01:21:39.815 --> 01:21:42.915
So for 2010, 11 and 12 hot needs to be red

1968
01:21:42.935 --> 01:21:44.195
and there needs to be a key.

1969
01:21:44.705 --> 01:21:49.235
Same with mild needs to be green for each of the three

1970
01:21:49.235 --> 01:21:50.355
of them cold, it needs to be

1971
01:21:50.355 --> 01:21:51.475
blue for each of the three of them.

1972
01:21:51.775 --> 01:21:53.235
And notice how, because their frequency,

1973
01:21:53.235 --> 01:21:54.275
they're really easy to compare.

1974
01:21:54.275 --> 01:21:56.115
So percentage frequency, they're really easy to compare.

1975
01:21:56.285 --> 01:21:57.795
Percentage frequency, they all need to get

1976
01:21:57.795 --> 01:21:59.515
to a hundred percent, which makes sense.

1977
01:21:59.705 --> 01:22:03.505
It's a hundred percent data. And then as you can see,

1978
01:22:03.505 --> 01:22:04.825
you can sort of compare these directly.

1979
01:22:04.825 --> 01:22:06.345
You can say all right, oh this is a hundred to,

1980
01:22:06.345 --> 01:22:07.785
you know, probably like 76.

1981
01:22:07.785 --> 01:22:10.905
So it's like 24% of the year 2010 was cold.

1982
01:22:11.375 --> 01:22:15.785
Whereas in 2011 it was like, maybe this is like 87,

1983
01:22:15.965 --> 01:22:17.425
so it's like 13%.

1984
01:22:17.965 --> 01:22:21.365
And then here you could say it was maybe like, um,

1985
01:22:21.465 --> 01:22:22.845
that's maybe down to like 81.

1986
01:22:22.845 --> 01:22:24.805
So you could say like 19% in 2012.

1987
01:22:24.805 --> 01:22:26.565
So it's a really easy way of comparing.

1988
01:22:26.595 --> 01:22:28.245
It's a really nice way of comparing.

1989
01:22:28.625 --> 01:22:30.805
Um, and then you go through it like that.

1990
01:22:30.985 --> 01:22:34.405
So this is a really nice little little graph to be able to,

1991
01:22:34.425 --> 01:22:36.005
to draw and to be able to utilize.

1992
01:22:36.465 --> 01:22:39.925
But you really, really need to know that you do need a key.

1993
01:22:40.225 --> 01:22:42.405
Really important. If you don't have a key, it is wrong.

1994
01:22:42.515 --> 01:22:44.205
There's no, there's no

1995
01:22:44.205 --> 01:22:45.445
explanation as to what's going on here.

1996
01:22:45.745 --> 01:22:48.165
Now there is, you need to have a key.

1997
01:22:50.985 --> 01:22:52.325
Now two way frequency tables.

1998
01:22:52.775 --> 01:22:56.725
Again, these are not all that useful.

1999
01:22:57.185 --> 01:22:58.485
Now they are, but they're not.

2000
01:22:58.815 --> 01:23:01.725
Again, we wanna avoid frequency tables at all costs.

2001
01:23:01.945 --> 01:23:06.205
So frequency tables are last resort, just like in univa,

2002
01:23:06.345 --> 01:23:08.485
in bi-variate, we don't wanna use frequency tables.

2003
01:23:08.825 --> 01:23:11.365
So frequency tables can be two way in a sense

2004
01:23:11.995 --> 01:23:15.125
that we have our response variable on our rows.

2005
01:23:15.625 --> 01:23:18.605
So for and against. And we have our, um,

2006
01:23:18.605 --> 01:23:20.165
inflammatory variable on our columns.

2007
01:23:20.305 --> 01:23:22.365
So the year level is gonna explain the for

2008
01:23:22.365 --> 01:23:24.565
and against maybe in policy change at the school,

2009
01:23:24.575 --> 01:23:27.325
maybe it's got to do with year 12 jumpers or something.

2010
01:23:27.585 --> 01:23:31.285
Um, you're obviously going to have more people in the four

2011
01:23:31.825 --> 01:23:33.845
for year 12 because it affects year 12.

2012
01:23:33.915 --> 01:23:36.245
Like they, they want it, whereas against

2013
01:23:36.345 --> 01:23:38.485
as year elevens are generally against it

2014
01:23:38.605 --> 01:23:40.525
'cause it might be, you know, detrimental to them.

2015
01:23:40.585 --> 01:23:42.165
So they're more likely to be against it.

2016
01:23:42.185 --> 01:23:44.325
So that's why the year level is explanatory.

2017
01:23:44.665 --> 01:23:48.655
And the attitude is the response really important.

2018
01:23:48.655 --> 01:23:50.975
With these graphs, again, please try

2019
01:23:50.975 --> 01:23:51.975
and do them as a percentage.

2020
01:23:52.195 --> 01:23:53.855
Um, you can do them as raw data,

2021
01:23:53.995 --> 01:23:55.855
but you, the percentage is better.

2022
01:23:56.315 --> 01:23:59.215
Um, just because you may have different numbers of year 11

2023
01:23:59.235 --> 01:24:00.695
and different numbers of year 12.

2024
01:24:00.835 --> 01:24:02.335
So then your data looks a bit weird

2025
01:24:02.975 --> 01:24:05.295
'cause it might look like there might be way more year

2026
01:24:05.295 --> 01:24:07.775
twelves that answered it than there are year 11.

2027
01:24:07.835 --> 01:24:10.775
So then your 19% might actually be

2028
01:24:10.775 --> 01:24:12.135
as a raw data value like 20,

2029
01:24:12.435 --> 01:24:15.255
but then your 64% of year elevens might

2030
01:24:15.255 --> 01:24:16.655
as a raw data be like 18.

2031
01:24:17.235 --> 01:24:18.935
So then it looks like you might look at it

2032
01:24:18.935 --> 01:24:21.835
and be like, oh, there's actually more people

2033
01:24:21.835 --> 01:24:22.955
against it in the year twelves.

2034
01:24:22.955 --> 01:24:25.515
But if you look at that as a percentage, it's a lot less.

2035
01:24:26.175 --> 01:24:30.955
So please try and use percentages, um, rather than raw data.

2036
01:24:31.105 --> 01:24:33.435
Once again, we always wanna use percentages over raw data.

2037
01:24:34.475 --> 01:24:36.415
Um, so what can we see from this?

2038
01:24:36.755 --> 01:24:39.055
So you can discuss where are the associations.

2039
01:24:39.715 --> 01:24:42.535
Um, if it was random, we would expect percentage to be

2040
01:24:42.535 --> 01:24:43.975
around 50 50, but it's not.

2041
01:24:44.115 --> 01:24:46.615
So therefore we can discuss that there is an association

2042
01:24:46.615 --> 01:24:49.455
between these things and that's why as much

2043
01:24:50.035 --> 01:24:51.655
as a pregnancy table aren't as useful,

2044
01:24:51.655 --> 01:24:53.215
that they can still display this stuff

2045
01:24:53.215 --> 01:24:57.535
and be useful, um, in terms of looking at what's going on.

2046
01:24:59.655 --> 01:25:02.515
So moving forward,

2047
01:25:04.455 --> 01:25:06.955
uh, back to back stem and leaf plots.

2048
01:25:07.015 --> 01:25:08.035
So back toback, stem

2049
01:25:08.035 --> 01:25:11.675
and leaf plots can use two pieces of categorical data

2050
01:25:12.335 --> 01:25:14.675
and they can use numerical data.

2051
01:25:15.215 --> 01:25:17.995
So, um, it's a bit of an odd one here

2052
01:25:17.995 --> 01:25:21.795
because in this sense here we might be looking at, I dunno,

2053
01:25:22.135 --> 01:25:25.635
age or maybe, you know, diameter of

2054
01:25:26.295 --> 01:25:27.515
of the iris or something.

2055
01:25:27.515 --> 01:25:29.395
It's, it's a bit weird. I know why we used eye color

2056
01:25:29.415 --> 01:25:32.915
and then I don't know what, whatever the,

2057
01:25:33.395 --> 01:25:35.115
whatever the numerical piece of data is here,

2058
01:25:35.145 --> 01:25:36.355
it's a bit of a weird example.

2059
01:25:36.515 --> 01:25:39.155
I don't dunno why we use this exact example.

2060
01:25:39.935 --> 01:25:43.905
Um, but as you can see here, we've got a key that says one

2061
01:25:43.925 --> 01:25:45.545
to zero equals 10,

2062
01:25:45.765 --> 01:25:48.105
and then we've got our blue eyes versus our brown,

2063
01:25:48.205 --> 01:25:50.345
our brown eye, our brown eyes.

2064
01:25:50.725 --> 01:25:53.145
So bit of an odd example,

2065
01:25:53.205 --> 01:25:55.065
but essentially your,

2066
01:25:55.595 --> 01:25:58.025
let's just say this is age, some with reason.

2067
01:25:58.365 --> 01:26:00.265
Um, so we're looking at people at a hundred years old

2068
01:26:00.265 --> 01:26:01.665
or 58 years old as our lowest.

2069
01:26:02.075 --> 01:26:04.205
We'll say it's age and we're looking at the eye color.

2070
01:26:04.265 --> 01:26:08.325
You can only have two categorical, um, pieces of data here.

2071
01:26:08.345 --> 01:26:10.205
You can't have more than more than two.

2072
01:26:10.505 --> 01:26:12.325
If you've got more than two, you can't use a stem

2073
01:26:12.345 --> 01:26:13.965
and leap plot or a back-to-back

2074
01:26:13.965 --> 01:26:15.085
stem leap plot is as it's called.

2075
01:26:15.385 --> 01:26:17.805
And then our numerical data is the age, you know,

2076
01:26:17.805 --> 01:26:21.365
they're 58 years old, 61 years old, 60 years old, et cetera.

2077
01:26:21.665 --> 01:26:24.085
Um, and we're trying to see if as association, it's a bit

2078
01:26:24.085 --> 01:26:26.205
of a weird one, but essentially that's

2079
01:26:26.205 --> 01:26:27.365
what they're doing there.

2080
01:26:29.765 --> 01:26:33.345
Um, and then as you can see here, we also have our,

2081
01:26:33.885 --> 01:26:36.345
um, parallel dot plots.

2082
01:26:36.345 --> 01:26:38.705
Now parallel dot plots, you can have more than two pieces

2083
01:26:38.725 --> 01:26:41.305
of categorical data for every piece of categorical data,

2084
01:26:41.565 --> 01:26:43.185
you've just gotta have another dot plot.

2085
01:26:44.375 --> 01:26:47.115
Key point with dot plots, they must be on the same

2086
01:26:47.665 --> 01:26:50.035
axis in terms of the numbers must match up.

2087
01:26:50.135 --> 01:26:51.955
So they're not on the same axis in terms

2088
01:26:51.955 --> 01:26:54.195
of you draw a line each time you have a category.

2089
01:26:54.535 --> 01:26:56.235
So you do a line to boy, you do a line for girls,

2090
01:26:56.235 --> 01:26:59.315
you do a line for, you know, whatever the next one is.

2091
01:26:59.775 --> 01:27:02.555
Um, like if you're not using genders in a sense,

2092
01:27:02.555 --> 01:27:07.305
you could use um, say year levels again.

2093
01:27:07.365 --> 01:27:11.225
So you're year 10 year, uh, year 10, year 11, year 12.

2094
01:27:11.805 --> 01:27:13.425
You just write a third line up here.

2095
01:27:13.725 --> 01:27:17.145
So as you can see, you just keep writing a line every time,

2096
01:27:17.805 --> 01:27:20.305
um, and or drawing out a line each time

2097
01:27:20.325 --> 01:27:21.505
and then putting your data on.

2098
01:27:21.885 --> 01:27:23.385
But the lines need to match up.

2099
01:27:23.645 --> 01:27:25.225
As you can see here, this could be ages,

2100
01:27:25.255 --> 01:27:26.865
this could be boys and girls.

2101
01:27:26.965 --> 01:27:29.345
And then this could be the ages of the boys

2102
01:27:29.345 --> 01:27:32.065
and girls that are, you know, swimming at a swimming club.

2103
01:27:32.065 --> 01:27:33.545
This could be like a swimming club or something like that.

2104
01:27:34.165 --> 01:27:36.865
Um, and as you can see here, you've got your boys,

2105
01:27:36.965 --> 01:27:39.185
you've got your girls and you've got your ages

2106
01:27:39.715 --> 01:27:41.105
along this line here.

2107
01:27:41.765 --> 01:27:45.545
And as you can see the all of the data values match up.

2108
01:27:45.565 --> 01:27:49.905
So you've got 11, 11, you've got 10, you've got 10, yes,

2109
01:27:50.125 --> 01:27:52.985
the actual values are different, like the actual data

2110
01:27:54.125 --> 01:27:55.185
and the frequency of each

2111
01:27:55.185 --> 01:27:57.225
because you're measuring, you know,

2112
01:27:57.225 --> 01:27:59.705
you've counted in different categories, um,

2113
01:28:00.205 --> 01:28:02.745
but your access lines up and it's really important.

2114
01:28:02.745 --> 01:28:04.425
Your access has to line up.

2115
01:28:04.445 --> 01:28:07.015
You cannot have different access.

2116
01:28:07.015 --> 01:28:08.455
You can't start this access at eight

2117
01:28:08.455 --> 01:28:09.655
rather than starting this one at nine.

2118
01:28:12.075 --> 01:28:15.855
Um, and then lastly, probably, um,

2119
01:28:16.795 --> 01:28:19.855
our most important one outside of our

2120
01:28:20.605 --> 01:28:22.815
scatter plots is our box plots.

2121
01:28:22.955 --> 01:28:26.295
And our box plots generally get asked to,

2122
01:28:26.355 --> 01:28:27.455
we generally get asked to

2123
01:28:27.455 --> 01:28:28.655
discuss the differences between them.

2124
01:28:28.715 --> 01:28:31.335
So comparing box plots,

2125
01:28:31.485 --> 01:28:33.815
because we always get asked to discuss, you know,

2126
01:28:33.915 --> 01:28:36.095
box plots in our uni area, they always ask us

2127
01:28:36.095 --> 01:28:37.975
to discuss box plots in our vibe area.

2128
01:28:38.035 --> 01:28:39.815
It just happens to be the way things work.

2129
01:28:39.885 --> 01:28:41.375
They just like to do it like that.

2130
01:28:41.835 --> 01:28:43.415
Um, and we sort of just have to get used to it.

2131
01:28:44.415 --> 01:28:47.595
So as you can see here, when we have our box plots here,

2132
01:28:47.595 --> 01:28:49.275
let's just say this is in this case here,

2133
01:28:49.275 --> 01:28:52.035
we've used our boys and our girls and we're using a score.

2134
01:28:52.525 --> 01:28:53.995
Maybe it's like a beat test or something.

2135
01:28:53.995 --> 01:28:55.755
I don't know what it is. Nonetheless, um,

2136
01:28:55.775 --> 01:28:57.115
if you're only 26 beat test,

2137
01:28:57.115 --> 01:28:59.555
you've beat test far nonetheless.

2138
01:28:59.895 --> 01:29:02.395
Um, you've got a a, a boys,

2139
01:29:02.395 --> 01:29:03.835
one's at the top and a girl's one's at the top.

2140
01:29:03.855 --> 01:29:07.075
Now really important about our parallel box spots.

2141
01:29:07.095 --> 01:29:09.475
We have one axis. So you can't actually, you know,

2142
01:29:09.505 --> 01:29:11.835
have different AEs and then make that mistake.

2143
01:29:11.855 --> 01:29:14.235
You have one axis and you just draw them on top

2144
01:29:14.235 --> 01:29:15.995
of each other and then you just write

2145
01:29:16.015 --> 01:29:18.435
to the side which one it is or label each one.

2146
01:29:18.615 --> 01:29:20.675
So as you can see, they're all on the same axis.

2147
01:29:20.945 --> 01:29:22.995
Therefore you cannot get the access side

2148
01:29:22.995 --> 01:29:24.515
of things wrong like you potentially can

2149
01:29:24.515 --> 01:29:25.595
with the parallel dot plot.

2150
01:29:26.375 --> 01:29:30.035
In this one here as well, we use, uh, categorical

2151
01:29:30.135 --> 01:29:31.275
and numerical data.

2152
01:29:31.485 --> 01:29:33.275
Again, numerical data is on that axis

2153
01:29:33.275 --> 01:29:34.915
and a categorical data as each box plot.

2154
01:29:34.915 --> 01:29:36.875
So for each category we have another box plot.

2155
01:29:36.895 --> 01:29:39.275
You can have as many box plots as you want.

2156
01:29:39.895 --> 01:29:41.835
Um, that's up to you, you're welcome to do it.

2157
01:29:42.675 --> 01:29:44.235
A really common thing for people to do with this

2158
01:29:44.235 --> 01:29:45.635
as well is to put the access in the middle.

2159
01:29:45.635 --> 01:29:46.635
If you've got two categories,

2160
01:29:46.935 --> 01:29:47.995
say put the access in the middle

2161
01:29:47.995 --> 01:29:49.275
and they put one box plot below

2162
01:29:49.275 --> 01:29:52.195
or one po, one box plot above, you're welcome to do that.

2163
01:29:52.345 --> 01:29:54.275
There's nothing against it. It looks kind of cool.

2164
01:29:54.275 --> 01:29:56.955
Sometimes it's easier to read. Happy for you to do that.

2165
01:29:58.315 --> 01:29:59.605
Then this question here, they want you

2166
01:29:59.605 --> 01:30:01.885
to discuss the exact points we discussed earlier,

2167
01:30:02.025 --> 01:30:03.525
but they want you to compare it.

2168
01:30:03.745 --> 01:30:07.605
So as you can see here, this response doesn't give context

2169
01:30:07.785 --> 01:30:09.085
and that's why I don't like this response.

2170
01:30:09.125 --> 01:30:10.285
I think it needs to give context first.

2171
01:30:10.285 --> 01:30:13.765
So you need to say first, um, parallel box plots, uh,

2172
01:30:14.585 --> 01:30:17.725
are utilized to display test scores between boys

2173
01:30:17.785 --> 01:30:18.845
and girls in a class.

2174
01:30:20.565 --> 01:30:21.615
Then we go onto our test.

2175
01:30:21.715 --> 01:30:25.295
The distribution of boys scores are test scores are on the

2176
01:30:25.295 --> 01:30:27.935
test are negatively skewed whilst the girls' scores

2177
01:30:28.155 --> 01:30:29.295
is positively skewed.

2178
01:30:30.125 --> 01:30:31.905
You've compared the the shape done.

2179
01:30:33.595 --> 01:30:35.145
There are no outliers.

2180
01:30:35.705 --> 01:30:38.265
I would rather you say there are no outliers in each

2181
01:30:39.105 --> 01:30:41.225
category that works better.

2182
01:30:42.165 --> 01:30:44.025
The median score for boys is higher.

2183
01:30:44.605 --> 01:30:48.745
Median equals 23 than for girls median equals 9.5.

2184
01:30:48.885 --> 01:30:50.465
Really, really good. They stated

2185
01:30:50.485 --> 01:30:51.665
the values really important.

2186
01:30:51.665 --> 01:30:54.025
You shouldn't just say one is higher than the other

2187
01:30:54.025 --> 01:30:55.265
without stating the values.

2188
01:30:56.325 --> 01:30:59.105
The IQR is smaller for boys 10 than for girls.

2189
01:30:59.485 --> 01:31:00.985
The range for scores for boys

2190
01:31:00.985 --> 01:31:02.585
and girls of equal range equal 90.

2191
01:31:02.765 --> 01:31:06.065
Really awesome response outside of not having the context,

2192
01:31:06.495 --> 01:31:08.545
this is a great response in terms of that they

2193
01:31:09.085 --> 01:31:10.105
stated the values,

2194
01:31:10.585 --> 01:31:12.785
whenever they compare, they always compare.

2195
01:31:12.785 --> 01:31:14.225
They always said, you know, one is higher than the

2196
01:31:14.225 --> 01:31:15.385
other, one is higher than the other.

2197
01:31:15.965 --> 01:31:17.665
Um, then they said that they're equal,

2198
01:31:18.045 --> 01:31:20.265
but they gave the value every single time

2199
01:31:20.365 --> 01:31:21.865
and that's really, really important.

2200
01:31:22.085 --> 01:31:23.985
So that's the best way to describe it.

2201
01:31:24.045 --> 01:31:26.425
You will commonly get asked to describe categorical

2202
01:31:26.725 --> 01:31:30.825
and numerical by variate as a box block, less commonly

2203
01:31:30.925 --> 01:31:31.945
as a dot plot

2204
01:31:32.055 --> 01:31:35.345
because a dot plot is less, it's not as easy to work with

2205
01:31:35.685 --> 01:31:38.705
and very, very rarely as a back toback stem link.

2206
01:31:38.765 --> 01:31:40.545
That's also quite difficult to work with.

2207
01:31:42.445 --> 01:31:44.505
But then we move into biva, um,

2208
01:31:45.135 --> 01:31:46.745
biva numerical, I mean, sorry.

2209
01:31:46.885 --> 01:31:48.505
So we've got numerical versus numerical.

2210
01:31:48.505 --> 01:31:51.705
Well in this case here we use a scatter plot.

2211
01:31:51.705 --> 01:31:53.665
Now this is easily the most useful

2212
01:31:53.725 --> 01:31:56.465
and the most common in real world examples.

2213
01:31:57.365 --> 01:31:58.535
It's just so much better

2214
01:31:58.535 --> 01:32:00.085
to look at numerical versus numerical.

2215
01:32:00.085 --> 01:32:01.565
You get so much more information out of it

2216
01:32:01.665 --> 01:32:03.725
and you can sort of plot this out on your calculator

2217
01:32:03.725 --> 01:32:04.885
and it looks a lot like this.

2218
01:32:05.425 --> 01:32:10.285
Now when you get a piece of, you know, a piece

2219
01:32:10.285 --> 01:32:12.685
of biva data that is numerical numerical

2220
01:32:12.685 --> 01:32:13.685
and you get a dot plot,

2221
01:32:13.745 --> 01:32:15.565
you're gonna be asked three things about it.

2222
01:32:15.565 --> 01:32:17.085
You're gonna be asked, well what is

2223
01:32:17.085 --> 01:32:18.125
the strength of the data?

2224
01:32:18.595 --> 01:32:19.845
What is the direction of the data?

2225
01:32:19.985 --> 01:32:21.365
And what is the form of the data?

2226
01:32:21.385 --> 01:32:23.685
And you're gonna say to me, what the heck does that mean?

2227
01:32:24.065 --> 01:32:25.085
So let's go through each of those.

2228
01:32:26.195 --> 01:32:27.465
First of all, talk about strength.

2229
01:32:28.345 --> 01:32:29.865
Strength of the data is calculated

2230
01:32:29.885 --> 01:32:31.905
by your peer since correlation coefficient.

2231
01:32:31.905 --> 01:32:33.665
However, if you didn't do general one, two,

2232
01:32:33.665 --> 01:32:35.985
you won't have heard of this before if you did,

2233
01:32:36.245 --> 01:32:37.385
it should be a little bit of revision.

2234
01:32:37.725 --> 01:32:39.105
But this is your r value.

2235
01:32:39.575 --> 01:32:43.145
Your r value measures the strength of a linear relationship.

2236
01:32:43.525 --> 01:32:47.265
So what we wanna talk about in our scatterplots is

2237
01:32:47.325 --> 01:32:48.505
how linear is our data?

2238
01:32:48.535 --> 01:32:50.905
Does our data actually have some sort of linearity to it?

2239
01:32:50.925 --> 01:32:53.425
And when it's linear, it generally has a trend.

2240
01:32:55.195 --> 01:32:57.455
We generally assume that a linear relationship is present.

2241
01:32:57.555 --> 01:32:59.695
In some cases it isn't. But we'll get to that.

2242
01:33:00.275 --> 01:33:02.615
If there is not a linear relationship present,

2243
01:33:02.615 --> 01:33:04.575
like your data clearly curves up

2244
01:33:04.575 --> 01:33:07.175
and then curves down, you should never do an R value

2245
01:33:07.325 --> 01:33:10.215
because if there's a clear curve to it, we assume

2246
01:33:10.685 --> 01:33:12.775
that the data trends together.

2247
01:33:13.195 --> 01:33:14.935
But your R value may not say that

2248
01:33:14.935 --> 01:33:16.335
because there is a curve to it.

2249
01:33:16.435 --> 01:33:18.215
If your data does not show a distinct curve

2250
01:33:18.215 --> 01:33:19.695
and it's sort of like a little bit of a mess,

2251
01:33:19.955 --> 01:33:21.055
you then do an R value

2252
01:33:21.315 --> 01:33:23.815
and your R value will be correctly told you whether there is

2253
01:33:23.975 --> 01:33:26.055
actually a linear relationship in there or there is not.

2254
01:33:27.235 --> 01:33:29.575
Um, you always find the R value using your calculator.

2255
01:33:29.795 --> 01:33:30.975
You cannot do it by hand.

2256
01:33:31.315 --> 01:33:33.095
You can, but it's technically way too hard

2257
01:33:33.115 --> 01:33:34.655
and it's not part of the card.

2258
01:33:34.955 --> 01:33:37.495
Um slash vce you need to do it on take later,

2259
01:33:37.495 --> 01:33:39.095
always it's in your statistics

2260
01:33:39.095 --> 01:33:41.815
or you can do it in um, the main page if you have the right,

2261
01:33:42.355 --> 01:33:43.815
um, pieces of information.

2262
01:33:44.915 --> 01:33:46.575
Now what's really important is this here.

2263
01:33:46.755 --> 01:33:49.735
Now this should be found in your summary book.

2264
01:33:50.405 --> 01:33:54.495
This is a little summary of how you describe your R value.

2265
01:33:54.495 --> 01:33:56.535
So you find your R value, you put your data in

2266
01:33:56.755 --> 01:33:58.735
or they give you the right information

2267
01:33:59.195 --> 01:34:00.375
and you get your R value.

2268
01:34:00.555 --> 01:34:02.655
And then you need to describe what is the strength.

2269
01:34:03.125 --> 01:34:07.735
Well, you are going to say that if your R value is

2270
01:34:07.735 --> 01:34:12.655
between 0.75 and 0.99 or negative 0.75

2271
01:34:12.755 --> 01:34:16.775
and negative 0.99, the closer your R value is to one

2272
01:34:17.435 --> 01:34:20.735
or negative one, the stronger it is,

2273
01:34:21.115 --> 01:34:24.295
the closer your value is to zero, the weaker it is.

2274
01:34:24.635 --> 01:34:29.055
So you always wanna say if it is between 0.75 and 0.99

2275
01:34:29.055 --> 01:34:31.695
or negative 0.75 and negative 0.99.

2276
01:34:31.715 --> 01:34:35.295
It has, it is a strong linear relationship with your value.

2277
01:34:35.995 --> 01:34:38.175
And then you can add in this extra bit

2278
01:34:38.195 --> 01:34:39.575
and you can discuss direction.

2279
01:34:39.575 --> 01:34:40.535
So direction was one of the

2280
01:34:40.535 --> 01:34:41.575
other points we wanted to talk about.

2281
01:34:41.955 --> 01:34:43.495
The R value also gives you direction.

2282
01:34:44.075 --> 01:34:47.935
If it is a negative value, it is a negative direction,

2283
01:34:48.045 --> 01:34:49.295
it's going down this way.

2284
01:34:49.595 --> 01:34:52.975
If it is a positive value, it is going up this way.

2285
01:34:54.075 --> 01:34:56.855
So that that is how you describe direction.

2286
01:34:56.855 --> 01:34:58.805
So your R value R value gives you

2287
01:34:58.805 --> 01:35:00.125
both strength and direction.

2288
01:35:00.145 --> 01:35:01.245
So super duper important.

2289
01:35:02.265 --> 01:35:04.285
Um, so you can only calculate R values

2290
01:35:04.345 --> 01:35:06.325
for linear dataset you already discussed.

2291
01:35:07.155 --> 01:35:09.775
So as you can see here, you've got a positive one on the

2292
01:35:09.775 --> 01:35:10.815
left and you got a negative on the right.

2293
01:35:10.815 --> 01:35:12.975
So you can technically say, you know, positive one,

2294
01:35:12.995 --> 01:35:15.215
you can just looking at it, your R value gives you more

2295
01:35:15.215 --> 01:35:17.535
of your strength, but you can do both from your R value.

2296
01:35:19.705 --> 01:35:23.245
Um, and then you've got your linear your form.

2297
01:35:23.425 --> 01:35:25.365
So your form is where you say, okay,

2298
01:35:26.425 --> 01:35:28.145
I think my data is linear.

2299
01:35:28.585 --> 01:35:29.585
I think my data is not linear.

2300
01:35:29.785 --> 01:35:31.345
I think my data doesn't have an association.

2301
01:35:31.385 --> 01:35:33.385
I think my data is non-linear.

2302
01:35:33.485 --> 01:35:34.905
So it's very much like strength.

2303
01:35:35.365 --> 01:35:38.025
Um, but it's where you discuss any of these four points.

2304
01:35:38.045 --> 01:35:40.105
So you need to mention these four points in there.

2305
01:35:40.725 --> 01:35:42.465
Um, and then here's another really nice little summary.

2306
01:35:42.515 --> 01:35:44.705
Again, I do apologize. I think when I stretched this,

2307
01:35:44.705 --> 01:35:46.825
this sort of squeezed up some

2308
01:35:46.825 --> 01:35:48.745
of the pages when I stretched it, it squeezed

2309
01:35:48.745 --> 01:35:50.065
and some of the pages stretched.

2310
01:35:50.065 --> 01:35:52.465
It was a bit weird. Um, I did think, I thought I got thrown,

2311
01:35:52.465 --> 01:35:54.345
fixed most of it, but I obviously missed a couple of pages.

2312
01:35:54.515 --> 01:35:55.745
He'll apologize. But

2313
01:35:55.745 --> 01:35:57.625
nonetheless, in this, in this year,

2314
01:35:57.625 --> 01:36:01.665
this is a really good way of describing sort

2315
01:36:01.665 --> 01:36:03.185
of your linear data

2316
01:36:04.295 --> 01:36:06.475
and describing your scattered plots

2317
01:36:06.955 --> 01:36:08.155
dependent on your R value.

2318
01:36:08.155 --> 01:36:10.915
So if it's a linear positive and strong, I can be concluded.

2319
01:36:10.915 --> 01:36:12.035
The y value should increase

2320
01:36:12.035 --> 01:36:14.435
as the X values increase linear, positive, moderate.

2321
01:36:14.435 --> 01:36:17.435
There is some evidence to suggest linear positive, weak,

2322
01:36:17.435 --> 01:36:19.035
there is limited evidence to suggest.

2323
01:36:19.035 --> 01:36:21.275
So these are really good ways of describing what's going on

2324
01:36:21.275 --> 01:36:23.555
with your R value and what's going on with your scatter plot

2325
01:36:23.695 --> 01:36:26.915
and it's a great way to put that information into use.

2326
01:36:28.305 --> 01:36:33.155
Alright, now in our last 20 ish minutes,

2327
01:36:33.855 --> 01:36:37.875
we are going to cover by variate modeling.

2328
01:36:38.615 --> 01:36:41.235
Um, so as you can see is our summary summary for bi variate.

2329
01:36:41.445 --> 01:36:43.195
We're gonna cover our modeling data.

2330
01:36:43.375 --> 01:36:45.395
So we're up to page 110.

2331
01:36:45.855 --> 01:36:48.235
Um, as you can see, these slides are actually not meant

2332
01:36:48.235 --> 01:36:51.995
to go past a hundred, but given today we are, um,

2333
01:36:52.095 --> 01:36:55.635
and we've only got 124 slides, so we've got 14 slides to go

2334
01:36:55.635 --> 01:36:57.595
through in our last 20 ish minutes, which is pretty good

2335
01:36:57.595 --> 01:37:00.755
because that's about how much information there is.

2336
01:37:02.535 --> 01:37:04.235
So by variate data

2337
01:37:04.235 --> 01:37:06.115
and in particular by vari data

2338
01:37:06.115 --> 01:37:08.675
with two miracle values is extremely useful.

2339
01:37:08.905 --> 01:37:10.915
This is because we can use it to construct models,

2340
01:37:11.115 --> 01:37:13.595
mathematical equations that allow us to predict to the,

2341
01:37:13.595 --> 01:37:15.395
those of data points we didn't even measure.

2342
01:37:15.775 --> 01:37:18.435
So as you can see here with our scatter plot,

2343
01:37:19.635 --> 01:37:23.095
our scatterplot here clearly shows a trend

2344
01:37:23.395 --> 01:37:25.535
and it clearly would have a pretty strong r value.

2345
01:37:25.725 --> 01:37:28.255
It's positive and it's a linear form.

2346
01:37:29.815 --> 01:37:32.625
What if I wanted to predict the value that was out here?

2347
01:37:32.695 --> 01:37:34.465
What if this x axis was time?

2348
01:37:34.965 --> 01:37:38.185
And as you can see, we, we set zero maybe as you know,

2349
01:37:38.185 --> 01:37:40.905
year 2000 and we're up to like, you know, at 60,

2350
01:37:41.045 --> 01:37:42.385
you know, or whatever.

2351
01:37:42.385 --> 01:37:43.585
However many years we've gone forward,

2352
01:37:43.725 --> 01:37:45.865
we predict what's gonna happen in the next couple of years.

2353
01:37:46.295 --> 01:37:49.305
Well, essentially we can with

2354
01:37:50.215 --> 01:37:53.105
this data if we make a line of best fit

2355
01:37:53.285 --> 01:37:57.645
and then say where should our data really be

2356
01:37:58.065 --> 01:37:59.165
in a couple of years time?

2357
01:37:59.585 --> 01:38:02.365
And that is where we can predict and we can model data.

2358
01:38:03.145 --> 01:38:06.515
So how do we come with a line of best fit?

2359
01:38:06.895 --> 01:38:09.435
So we come with a line of best fit with residuals.

2360
01:38:09.735 --> 01:38:14.435
Now a line of best fit essentially minimizes if I added.

2361
01:38:14.575 --> 01:38:16.595
So if I draw a line through through the data

2362
01:38:16.735 --> 01:38:21.395
and then I get the, the distance between each data point

2363
01:38:21.415 --> 01:38:24.715
and the line and I add all of those up, my line

2364
01:38:24.715 --> 01:38:26.395
of best fit is the line that makes

2365
01:38:26.395 --> 01:38:28.395
that the smallest number possible.

2366
01:38:29.415 --> 01:38:32.275
So it makes sure that the sum of all

2367
01:38:32.395 --> 01:38:34.795
of these residuals here, and we call 'em residuals.

2368
01:38:34.795 --> 01:38:36.315
The residual is essentially the distance

2369
01:38:36.315 --> 01:38:38.915
between the data point and the line of best fit.

2370
01:38:39.265 --> 01:38:42.355
Make sure the sum of all the residuals in this, uh,

2371
01:38:42.585 --> 01:38:44.395
plot is the smallest possible.

2372
01:38:44.615 --> 01:38:46.395
Now that's really hard to do by hand.

2373
01:38:46.775 --> 01:38:49.035
So essentially we do it on the calculator

2374
01:38:49.335 --> 01:38:51.955
and we come up with these cool equations

2375
01:38:51.955 --> 01:38:52.995
that give you what you want.

2376
01:38:53.455 --> 01:38:55.955
So this works best if there's no outliers.

2377
01:38:55.975 --> 01:38:58.515
If you've got outliers, it's a little bit difficult because

2378
01:38:58.515 --> 01:39:01.515
therefore you've gotta sort of, um, operate around them

2379
01:39:01.695 --> 01:39:03.355
and they can get really confusing.

2380
01:39:03.695 --> 01:39:05.315
So please make sure you try

2381
01:39:05.315 --> 01:39:06.715
to use does it doesn't have outliers.

2382
01:39:06.715 --> 01:39:10.515
If it does, therefore it can, uh, skew it a little bit.

2383
01:39:12.905 --> 01:39:15.565
So what does a line of S fit look like?

2384
01:39:15.625 --> 01:39:17.925
It looks like Y equals A plus BX.

2385
01:39:17.945 --> 01:39:19.165
Now, a lot of you will say, well

2386
01:39:19.165 --> 01:39:21.125
that looks a lot like Y equals MX plus C.

2387
01:39:21.595 --> 01:39:23.885
It's exactly the same. There's no other way of putting it.

2388
01:39:23.885 --> 01:39:26.525
It's exactly the same. However we put the C at the front

2389
01:39:26.625 --> 01:39:30.005
and we call it A and we put the B, we make the MAB.

2390
01:39:30.505 --> 01:39:34.055
So what that means, you need

2391
01:39:34.055 --> 01:39:38.335
to have your C, which is your Y access,

2392
01:39:39.355 --> 01:39:44.175
um, intercept your y access intercept, which is C is now A

2393
01:39:44.275 --> 01:39:47.615
and it has to be before your MX

2394
01:39:47.755 --> 01:39:49.415
or your BX in this case.

2395
01:39:49.755 --> 01:39:51.935
And then your B which is now, which is your,

2396
01:39:51.935 --> 01:39:54.655
which was your M which is your gradient needs

2397
01:39:54.675 --> 01:39:56.295
to be after it.

2398
01:39:57.085 --> 01:40:01.465
Um, so we use y equals A plus bx. It's a linear equation.

2399
01:40:01.945 --> 01:40:04.185
Y is your response variable again,

2400
01:40:04.445 --> 01:40:07.585
and X is your explanatory variable, just like our axi.

2401
01:40:07.585 --> 01:40:10.705
So we have our X on our, on our um, X axis

2402
01:40:10.705 --> 01:40:12.585
and our y our Y axis makes sense.

2403
01:40:14.005 --> 01:40:15.185
And please make sure you're entering

2404
01:40:15.185 --> 01:40:16.425
these variables in correct order.

2405
01:40:16.735 --> 01:40:19.025
Classic V card trick to give you the Y variable

2406
01:40:19.025 --> 01:40:21.345
before the X variable or the X variable for the Y variable.

2407
01:40:21.405 --> 01:40:22.425
And they can confuse you.

2408
01:40:22.425 --> 01:40:23.825
Please make sure you're putting your

2409
01:40:23.855 --> 01:40:25.225
your values in the right way.

2410
01:40:26.005 --> 01:40:28.545
Now there are two ways to calculate this line

2411
01:40:28.545 --> 01:40:30.065
and there are two scenarios that you'll be given.

2412
01:40:30.395 --> 01:40:32.305
First of all, you'll just be given all the raw data.

2413
01:40:32.405 --> 01:40:33.505
If you're given all the raw data,

2414
01:40:33.645 --> 01:40:35.585
you chuck it in a sheet in your calculator.

2415
01:40:35.585 --> 01:40:37.545
So you go to your sheets, part of your calculator

2416
01:40:37.805 --> 01:40:39.465
and you put it all in your sheets.

2417
01:40:39.525 --> 01:40:40.945
You go to your sheets and you go, all right,

2418
01:40:40.945 --> 01:40:42.505
I'm just gonna stick it all in my sheets

2419
01:40:42.845 --> 01:40:43.985
and I'm gonna put it all in there

2420
01:40:44.165 --> 01:40:45.865
and I'm gonna get my information that I need.

2421
01:40:47.125 --> 01:40:48.385
Now the second way

2422
01:40:48.385 --> 01:40:50.265
of doing this is they give you these weird pieces

2423
01:40:50.285 --> 01:40:52.905
of information and they give you your

2424
01:40:53.005 --> 01:40:54.405
Pearsons correlation coefficient.

2425
01:40:54.435 --> 01:40:57.245
They give you your standard deviation of your Y data set,

2426
01:40:57.245 --> 01:40:59.085
your standard deviation of your X data set

2427
01:40:59.425 --> 01:41:01.085
and then your mean of your X data set

2428
01:41:01.085 --> 01:41:02.645
and your mean of your Y data set.

2429
01:41:03.025 --> 01:41:06.155
And they say, tell me what the line best is.

2430
01:41:06.155 --> 01:41:07.915
And you'll be sitting there like what is going on?

2431
01:41:08.065 --> 01:41:11.795
Well the best way to do this is to use these formula here.

2432
01:41:11.855 --> 01:41:13.715
So these formula need

2433
01:41:13.715 --> 01:41:16.315
to be in your summary book slash found reference.

2434
01:41:16.405 --> 01:41:19.115
Again, a non-negotiable, they have to be in there.

2435
01:41:19.765 --> 01:41:24.515
These formula read B equals, so your slope gradient

2436
01:41:24.515 --> 01:41:26.075
or your M value if you like, things like that.

2437
01:41:26.075 --> 01:41:27.515
But it's your B is equal

2438
01:41:27.515 --> 01:41:30.715
to your Pearsons correlation coefficient multiplied

2439
01:41:30.815 --> 01:41:33.355
by your standard deviation of Y divided

2440
01:41:33.375 --> 01:41:34.835
by your standard deviation of x.

2441
01:41:36.195 --> 01:41:38.335
Really important that that is how that works.

2442
01:41:38.955 --> 01:41:42.785
You do B first, calculate B

2443
01:41:43.345 --> 01:41:45.905
'cause then you need your B value to find your A value

2444
01:41:46.165 --> 01:41:47.265
and your B value

2445
01:41:47.365 --> 01:41:51.465
and your, sorry, your A value is equal to your mean of Y

2446
01:41:51.975 --> 01:41:55.985
take away B multiplied by your mean of X.

2447
01:41:56.685 --> 01:41:59.545
So you need your B value first. Really important.

2448
01:42:00.465 --> 01:42:04.595
So as you can see here, we have a model of data.

2449
01:42:04.735 --> 01:42:06.435
We have some raw data given to you.

2450
01:42:06.455 --> 01:42:08.035
So the best way to go about this

2451
01:42:08.195 --> 01:42:11.155
'cause you've been given raw data, is bring calculator,

2452
01:42:11.175 --> 01:42:12.355
you've been given height and weight.

2453
01:42:12.705 --> 01:42:13.955
Well you need to figure out which

2454
01:42:13.955 --> 01:42:15.595
of these is the X and which of these is the Y.

2455
01:42:15.595 --> 01:42:16.755
Well, I'm gonna say right now

2456
01:42:16.755 --> 01:42:19.955
that my height is probably gonna explain my weight more than

2457
01:42:19.955 --> 01:42:21.195
my weight explains my height.

2458
01:42:21.575 --> 01:42:22.355
Um, if I'm taller

2459
01:42:22.415 --> 01:42:23.395
I'm probably gonna weigh a little bit more.

2460
01:42:23.425 --> 01:42:24.515
It's just naturally what happens.

2461
01:42:25.055 --> 01:42:29.465
So height as my X value Y, uh,

2462
01:42:29.465 --> 01:42:30.585
weight as my Y value.

2463
01:42:30.605 --> 01:42:32.465
So you need to interpret that, you need to be able

2464
01:42:32.465 --> 01:42:34.145
to use logical sense to do that.

2465
01:42:34.645 --> 01:42:37.585
So put this data into my cassio.

2466
01:42:37.805 --> 01:42:38.945
So I put into a list or

2467
01:42:38.945 --> 01:42:40.225
spreadsheets for the one you wanna use.

2468
01:42:40.645 --> 01:42:43.305
And then I find my regression equation.

2469
01:42:43.365 --> 01:42:44.785
So I go weight, I put in

2470
01:42:44.785 --> 01:42:47.865
and I go all I want linear regression, I do progression.

2471
01:42:48.325 --> 01:42:50.825
Um, usually it's in like your settings

2472
01:42:50.825 --> 01:42:52.665
or your calculations, part of your cassio.

2473
01:42:53.165 --> 01:42:58.065
And you find this exact piece of information here.

2474
01:42:58.065 --> 01:42:59.985
So you will go through and it will give it to you.

2475
01:43:00.085 --> 01:43:01.345
And then you put that on the page.

2476
01:43:01.485 --> 01:43:06.225
Really important, you don't write why equals 58.022 plus

2477
01:43:06.325 --> 01:43:07.825
1.63 by x.

2478
01:43:08.165 --> 01:43:10.345
You put in your variables,

2479
01:43:10.345 --> 01:43:14.985
you say weight equals 58.022 multiply, uh,

2480
01:43:14.985 --> 01:43:17.665
plus 1.63 multiplied by your height.

2481
01:43:19.285 --> 01:43:21.745
Now you may also need your also may need

2482
01:43:21.745 --> 01:43:23.145
to calculate these values.

2483
01:43:23.605 --> 01:43:26.945
Now the best way, uh, to describe these values, the best way

2484
01:43:26.945 --> 01:43:29.545
to describe these values is these exact lines here.

2485
01:43:30.145 --> 01:43:33.265
I used these exact lines in year 12.

2486
01:43:33.585 --> 01:43:34.785
I was provided them by my teacher.

2487
01:43:35.605 --> 01:43:37.305
Um, and then I found, I came here

2488
01:43:37.305 --> 01:43:39.465
to work at ATAR notes in 2019

2489
01:43:40.165 --> 01:43:43.545
and the lines I used were literally like two words different

2490
01:43:43.845 --> 01:43:45.545
or like two words that were different and that's all.

2491
01:43:45.605 --> 01:43:47.105
And they were exactly the same.

2492
01:43:47.105 --> 01:43:50.205
Pretty much these lines are universal.

2493
01:43:50.345 --> 01:43:54.365
You need to be able to put these two lines on a piece on the

2494
01:43:54.395 --> 01:43:56.525
exam, on your sacks and get the marks.

2495
01:43:56.675 --> 01:44:00.285
Essentially if I'm asked to describe my y intercept,

2496
01:44:00.285 --> 01:44:04.525
if I'm asked to describe a, I say the y intercept is at

2497
01:44:05.145 --> 01:44:06.245
and I get my a value.

2498
01:44:07.155 --> 01:44:08.775
And if I'm talking about, again,

2499
01:44:08.875 --> 01:44:10.255
I'm talking about weight and height.

2500
01:44:10.595 --> 01:44:12.255
So remember that weight was my,

2501
01:44:12.755 --> 01:44:15.615
was my uh, response variable.

2502
01:44:15.755 --> 01:44:19.015
So weight was on my y axis.

2503
01:44:19.515 --> 01:44:23.295
So my y intercept is, let's say A is 15, I say 15

2504
01:44:23.355 --> 01:44:24.975
and then weight, maybe it's in kilograms,

2505
01:44:24.975 --> 01:44:28.335
I say is at 15 full stop.

2506
01:44:28.765 --> 01:44:33.695
This means the Y variable which was weight is 15 kilograms.

2507
01:44:33.695 --> 01:44:35.815
So at the start you don't wanna say 15 kilograms,

2508
01:44:35.815 --> 01:44:38.095
you wanna say my y decept is at 15.

2509
01:44:38.925 --> 01:44:41.095
This means the weight.

2510
01:44:41.595 --> 01:44:45.375
So y variable, this means the weight is 15 kilograms.

2511
01:44:45.925 --> 01:44:50.255
When the X variable, the height is zero,

2512
01:44:50.265 --> 01:44:52.095
maybe it's centimeters, zero centimeters.

2513
01:44:52.195 --> 01:44:53.215
Now that doesn't make any sense.

2514
01:44:53.275 --> 01:44:55.695
Why would I be 15 kilograms if I'm zero centimeters tall?

2515
01:44:56.285 --> 01:44:57.815
That piece of information is a bit useless,

2516
01:44:57.915 --> 01:44:59.895
but they may ask you to interpret that.

2517
01:45:00.995 --> 01:45:02.125
Then we say the slope.

2518
01:45:02.145 --> 01:45:03.965
So if we are asked to interpret B,

2519
01:45:04.625 --> 01:45:08.285
the slope is maybe it's 1.2, 1.2.

2520
01:45:08.795 --> 01:45:11.725
This means that the weight

2521
01:45:13.825 --> 01:45:16.865
increases because 1.2 is positive.

2522
01:45:16.865 --> 01:45:19.425
If it was negative 1.2, I would say this means

2523
01:45:19.425 --> 01:45:22.225
that the weight decreases, but we're using positive here.

2524
01:45:22.285 --> 01:45:26.385
So 1.2 means the weight increases by

2525
01:45:26.925 --> 01:45:31.185
1.2 kilograms for every one

2526
01:45:31.955 --> 01:45:35.385
centimeter increase in height

2527
01:45:35.695 --> 01:45:39.145
because the one centimeter increase in height gives me a 1.2

2528
01:45:39.265 --> 01:45:41.385
kilogram increase in weight.

2529
01:45:42.255 --> 01:45:45.855
So using the word increases when B is positive

2530
01:45:45.855 --> 01:45:47.135
and decreases when B is negative,

2531
01:45:47.135 --> 01:45:48.735
replace everything in well what's orange

2532
01:45:48.795 --> 01:45:51.815
or not red, um, to fit the context of the question.

2533
01:45:51.815 --> 01:45:54.175
So it's really important that you, you change everything

2534
01:45:54.175 --> 01:45:57.035
that's a different everything here that's a different color.

2535
01:45:57.255 --> 01:45:59.955
Um, you change that to fit the context of the question.

2536
01:46:01.895 --> 01:46:04.675
Now, um, what about your ask word?

2537
01:46:04.735 --> 01:46:06.435
So you've probably heard of ask word before.

2538
01:46:06.545 --> 01:46:09.715
Undoubtedly you would've gone through ask word in year 11.

2539
01:46:10.135 --> 01:46:12.835
Um, but ask word is used when we can reasonably believe

2540
01:46:12.835 --> 01:46:13.835
there is causation.

2541
01:46:14.215 --> 01:46:16.115
Now we didn't go through causation and correlation

2542
01:46:16.195 --> 01:46:17.995
'cause that's actually been removed, but we

2543
01:46:17.995 --> 01:46:19.235
still go through ask word.

2544
01:46:19.855 --> 01:46:23.835
So ask squared tells us the extent to which X caused Y.

2545
01:46:24.255 --> 01:46:25.715
So you get your R value,

2546
01:46:25.715 --> 01:46:28.075
your business correlation coefficient and you square it.

2547
01:46:28.495 --> 01:46:30.155
And because you square it, you're going

2548
01:46:30.155 --> 01:46:34.115
to get a positive value between zero and between one.

2549
01:46:34.265 --> 01:46:35.795
Your value will between zero and one

2550
01:46:35.795 --> 01:46:37.115
and it will be a positive value.

2551
01:46:37.115 --> 01:46:38.355
You'll not get a negative value.

2552
01:46:38.375 --> 01:46:39.515
If you get a negative value,

2553
01:46:39.515 --> 01:46:41.475
you've done something wrong, go back and do it again.

2554
01:46:42.055 --> 01:46:46.525
Um, and you put it into this exact line here again,

2555
01:46:46.545 --> 01:46:47.565
or this is where the red was.

2556
01:46:47.565 --> 01:46:51.125
This is why I said um, fill in the red to the question.

2557
01:46:51.465 --> 01:46:52.765
But let's say for my weight

2558
01:46:52.765 --> 01:46:57.685
and my height, I got an ask where value of let's say 0.72,

2559
01:46:57.995 --> 01:47:01.965
0.72 is my ask where value, I would say the coefficient

2560
01:47:01.965 --> 01:47:05.975
of determination tells us that 0.72 by a hundred percent,

2561
01:47:06.315 --> 01:47:08.415
uh, will give me 72%.

2562
01:47:08.635 --> 01:47:11.015
So I get 72%

2563
01:47:11.595 --> 01:47:14.135
of the variation in the weight.

2564
01:47:14.805 --> 01:47:17.975
It's explained by the variation in the height.

2565
01:47:18.715 --> 01:47:22.855
So I fill in the blanks to make my sentence make sense.

2566
01:47:24.475 --> 01:47:27.695
Asberg will always come out of the calculator positive.

2567
01:47:28.475 --> 01:47:30.655
Um, we can tell if the positive

2568
01:47:30.955 --> 01:47:33.815
or we can tell if it is truly positive

2569
01:47:33.815 --> 01:47:37.095
or negative by observing the scatterplot or gradient.

2570
01:47:38.115 --> 01:47:43.025
So how do we then interpret the reliability of sort

2571
01:47:43.025 --> 01:47:44.985
of these pieces of information we've looked at?

2572
01:47:45.365 --> 01:47:47.825
So we make a line of best fit and we interpret it all.

2573
01:47:48.165 --> 01:47:50.985
But then what we do is we want to use that line of best fit

2574
01:47:50.985 --> 01:47:52.785
to say, all right, what's gonna happen next year?

2575
01:47:52.785 --> 01:47:55.705
What's gonna happen if I grow this many centimeters?

2576
01:47:56.085 --> 01:47:57.105
What's gonna happen then?

2577
01:47:57.575 --> 01:48:00.985
Well if I use a whole grouping of data

2578
01:48:02.085 --> 01:48:03.945
and let's say we're talking about our weight in a high

2579
01:48:04.005 --> 01:48:05.785
and I use, you know, people in my class

2580
01:48:06.485 --> 01:48:10.345
and people in my class uh, vary from 150 centimeters

2581
01:48:10.345 --> 01:48:14.065
to 180 centimeters and I'm 170 centimeters

2582
01:48:14.065 --> 01:48:17.465
and I wanna predict if I go to 175 centimeters, I go, I grow

2583
01:48:17.465 --> 01:48:19.305
by five centimeters, what is my weight gonna be?

2584
01:48:19.725 --> 01:48:23.305
So I get my line of best fit and I put in my X value

2585
01:48:23.885 --> 01:48:27.345
or my height as 175

2586
01:48:27.405 --> 01:48:29.345
and I get a weight of, you know, whatever it's,

2587
01:48:29.345 --> 01:48:32.425
let's just say 75 kilos or something or 72 kilos, whatever.

2588
01:48:32.425 --> 01:48:35.595
It's that there is my prediction.

2589
01:48:35.695 --> 01:48:36.875
I'm predicting that if I go to

2590
01:48:36.875 --> 01:48:38.355
that I should get to 72 kilos.

2591
01:48:38.355 --> 01:48:41.405
Well that's according to the data that there is

2592
01:48:41.405 --> 01:48:43.645
what we call an interpolation.

2593
01:48:43.995 --> 01:48:46.805
That is a fairly reliable prediction. Why?

2594
01:48:46.875 --> 01:48:48.645
Because it's within the data set.

2595
01:48:48.745 --> 01:48:50.885
My data set I used to make my line

2596
01:48:50.885 --> 01:48:52.925
of best fit was from 150 to 180.

2597
01:48:53.365 --> 01:48:56.725
I predicted within that. Now what if I was that 180?

2598
01:48:56.725 --> 01:48:58.125
What if I was that final data point

2599
01:48:58.185 --> 01:49:02.005
and I knew I was getting grow somehow to 185 centimeters.

2600
01:49:02.545 --> 01:49:06.005
So I now want to use that line of best fit to predict

2601
01:49:06.035 --> 01:49:08.245
what my weight will be when I'm 185.

2602
01:49:08.345 --> 01:49:10.685
And let's just say that says I'm gonna be 80 kilos,

2603
01:49:10.705 --> 01:49:13.245
so I got 185 centimeters, I'm gonna be 80 kilos.

2604
01:49:14.485 --> 01:49:18.575
That there is extrapolation that is outside my data set.

2605
01:49:18.875 --> 01:49:23.615
I'm now outside of that a hundred, I'm

2606
01:49:23.615 --> 01:49:27.135
outside of that 150 to 180 on my x axis.

2607
01:49:27.135 --> 01:49:29.255
I've moved outside of that. I'm at 185.

2608
01:49:29.435 --> 01:49:32.175
I'm now predicting beyond what I know.

2609
01:49:32.725 --> 01:49:35.415
Therefore we call this an unreliable prediction as much

2610
01:49:35.415 --> 01:49:36.615
as it is still utilized.

2611
01:49:36.615 --> 01:49:38.495
And that's what we do when we have time.

2612
01:49:38.515 --> 01:49:41.615
We predict using those, we still call it unreliable

2613
01:49:41.685 --> 01:49:45.695
because it is outside the data set so it is extrapolation.

2614
01:49:46.155 --> 01:49:49.145
So therefore we call that unreliable extrapolation.

2615
01:49:51.645 --> 01:49:53.385
Now how can we mathematically check?

2616
01:49:53.925 --> 01:49:57.025
Now the next point, and this sort of builds on from

2617
01:49:57.025 --> 01:50:00.185
what we've just been looking at, how do we then check if my

2618
01:50:00.185 --> 01:50:01.625
scatter flow is actually truly linear

2619
01:50:01.765 --> 01:50:03.545
and I can actually use it for prediction?

2620
01:50:03.545 --> 01:50:05.025
Well, we talked about residuals earlier

2621
01:50:05.025 --> 01:50:07.065
and how we want to minimize those residuals.

2622
01:50:07.245 --> 01:50:09.345
We can actually plot the residuals on a graph

2623
01:50:09.805 --> 01:50:11.905
and what we do is we take our actual Y value

2624
01:50:12.005 --> 01:50:14.385
and we take it away from our predicted y value.

2625
01:50:14.445 --> 01:50:16.065
So we take it away from our line of S fit

2626
01:50:16.165 --> 01:50:18.985
and we get residuals and then we plot those on a graph

2627
01:50:19.325 --> 01:50:21.065
and they should look something like this.

2628
01:50:21.765 --> 01:50:25.855
Now you should be able to get a

2629
01:50:27.295 --> 01:50:29.415
graph that sits

2630
01:50:29.415 --> 01:50:32.975
around a line YA line to zero graph.

2631
01:50:32.995 --> 01:50:35.175
So line zero graph is a graph that as you can see here,

2632
01:50:35.445 --> 01:50:38.135
it's got your standardized residual on your Y axis

2633
01:50:38.135 --> 01:50:39.455
and it's got your predicted

2634
01:50:40.115 --> 01:50:43.055
or your residual sort of values,

2635
01:50:44.075 --> 01:50:46.135
um, not your residual value.

2636
01:50:46.155 --> 01:50:50.055
So your your raw X values on your X axis.

2637
01:50:50.935 --> 01:50:52.675
And as you can see here, you've got a line through zero

2638
01:50:53.695 --> 01:50:56.075
if your residual plot produces a pattern.

2639
01:50:56.655 --> 01:50:58.875
If your residual plot producers a plat a pattern, no matter

2640
01:50:58.875 --> 01:51:00.555
what it is, it can be a straight line pattern.

2641
01:51:00.575 --> 01:51:02.395
It could be a straight line that goes here to here.

2642
01:51:02.935 --> 01:51:05.195
It could be a curve like this one that goes here to here

2643
01:51:05.195 --> 01:51:06.795
or a curve that goes like this one to here.

2644
01:51:07.095 --> 01:51:10.755
If it has a clear pattern, your residual plot, it means

2645
01:51:10.755 --> 01:51:14.035
that your original raw data was non-linear,

2646
01:51:14.375 --> 01:51:16.675
it was not linear and you cannot make predictions with it.

2647
01:51:17.465 --> 01:51:20.875
However, if your residual plot, first of all,

2648
01:51:20.895 --> 01:51:22.555
if all the lines are on your line of best fit,

2649
01:51:22.555 --> 01:51:23.595
you'll have a residual plot

2650
01:51:23.595 --> 01:51:25.275
that is just a straight line along here.

2651
01:51:25.505 --> 01:51:27.595
It's the only pattern that indicates

2652
01:51:27.785 --> 01:51:29.475
that you have a linear set of data.

2653
01:51:30.135 --> 01:51:33.065
However, if your residual plot has no pattern

2654
01:51:33.165 --> 01:51:35.825
and it is a mess, you have linear data.

2655
01:51:36.685 --> 01:51:37.825
So it's a bit of a weird one.

2656
01:51:37.825 --> 01:51:39.345
But if your residual plot has no

2657
01:51:39.345 --> 01:51:40.545
pattern, think of it as linear.

2658
01:51:40.615 --> 01:51:41.745
This is more of a niche thing.

2659
01:51:41.745 --> 01:51:43.665
It doesn't come up all that often. It does come up.

2660
01:51:44.965 --> 01:51:47.945
Now last quick point, this we're only gonna cover very

2661
01:51:47.945 --> 01:51:50.345
quickly and it would take far longer for you to sort

2662
01:51:50.345 --> 01:51:52.825
of understand it in, its in its entirety.

2663
01:51:53.125 --> 01:51:55.345
So we are just gonna cover it very, very briefly.

2664
01:51:56.365 --> 01:51:58.825
The only point I want you to get from this is one,

2665
01:51:59.175 --> 01:52:04.055
this diagram here must be on your summary book.

2666
01:52:04.055 --> 01:52:06.935
It is a non-negotiable. It must be on your summary book.

2667
01:52:06.955 --> 01:52:09.295
Now what does this mean? Now let's say you've done

2668
01:52:09.295 --> 01:52:10.655
that residual test and you find

2669
01:52:10.655 --> 01:52:11.855
that your data is non-linear.

2670
01:52:11.955 --> 01:52:13.175
You go back to your raw data

2671
01:52:13.235 --> 01:52:14.975
and you think that your data has a little bit,

2672
01:52:15.275 --> 01:52:17.095
it might be like this one on the top left here,

2673
01:52:17.365 --> 01:52:18.855
it's got a little, it looks pretty straight,

2674
01:52:18.855 --> 01:52:20.495
but it's got a little curve this way.

2675
01:52:20.645 --> 01:52:22.295
It's got a little bit of a curve like that.

2676
01:52:23.005 --> 01:52:25.975
What we do is we set that data up on this circle

2677
01:52:26.115 --> 01:52:27.495
and we put it in that top left.

2678
01:52:27.835 --> 01:52:28.935
And what we say is, well,

2679
01:52:28.985 --> 01:52:30.935
let's transform our data to make it linear.

2680
01:52:30.995 --> 01:52:32.775
We want to make predictions, we want to be able

2681
01:52:32.775 --> 01:52:34.855
to make predictions, but because it's non-linear according

2682
01:52:35.055 --> 01:52:37.455
to my residual plot, I can't make predictions.

2683
01:52:37.795 --> 01:52:39.175
How can I make predictions with it?

2684
01:52:39.445 --> 01:52:41.615
Well, I need to linearize the data

2685
01:52:42.555 --> 01:52:43.895
and to linearize the data.

2686
01:52:44.375 --> 01:52:47.565
I do this, I do one of the three transformations there.

2687
01:52:47.885 --> 01:52:50.325
I do Y squared log X or one over X.

2688
01:52:50.345 --> 01:52:52.005
And then you'll say to me, which one do I do?

2689
01:52:52.775 --> 01:52:55.635
You do all three. You do all three individually.

2690
01:52:55.975 --> 01:52:57.395
And when you do all three individually,

2691
01:52:57.575 --> 01:52:59.515
you then calculate your R value

2692
01:52:59.615 --> 01:53:00.995
for each of those sets of data.

2693
01:53:01.215 --> 01:53:04.035
So you'll do your raw X versus Y squared,

2694
01:53:04.135 --> 01:53:06.715
you'll do your raw Y versus log X

2695
01:53:06.975 --> 01:53:09.355
and you'll do your raw Y versus one over x.

2696
01:53:09.855 --> 01:53:12.555
You calculate an R value for each of those three sets

2697
01:53:12.555 --> 01:53:15.635
of data and the R value that is closest to one

2698
01:53:15.735 --> 01:53:19.155
or negative one, whichever one is the strongest R value is

2699
01:53:19.155 --> 01:53:20.475
the transformation you choose.

2700
01:53:20.745 --> 01:53:21.835
It's a little bit of trial and error,

2701
01:53:21.835 --> 01:53:23.675
it's a little bit annoying, but you have to do it

2702
01:53:24.695 --> 01:53:25.715
and it looks something like this.

2703
01:53:26.015 --> 01:53:29.355
So as you can see here, I've got a raw set of data Y

2704
01:53:29.355 --> 01:53:32.555
and X, which needed transformation transforming

2705
01:53:32.555 --> 01:53:34.875
because it was not, it was not very linear.

2706
01:53:35.215 --> 01:53:37.955
So therefore I transformed my X values and I did X squared.

2707
01:53:38.295 --> 01:53:40.275
And as you can see, it worked pretty well.

2708
01:53:40.435 --> 01:53:41.915
I obviously got a pretty good R value.

2709
01:53:42.135 --> 01:53:44.235
So therefore we then can graph it

2710
01:53:44.235 --> 01:53:46.915
and you graph it as Y versus X squared.

2711
01:53:46.915 --> 01:53:49.035
And as you can see, this is what that data looks like.

2712
01:53:49.135 --> 01:53:52.155
So this was pretation clearly got a bit of a curve

2713
01:53:52.175 --> 01:53:56.975
and if I go back to my curve here, I'm gonna use, I'm my,

2714
01:53:57.035 --> 01:53:58.815
whoops, I'm gonna use my X squared

2715
01:53:58.815 --> 01:54:00.095
because I'm in my bottom right.

2716
01:54:00.115 --> 01:54:02.415
So I'm down here in my bottom right as you can see

2717
01:54:03.165 --> 01:54:05.655
down here in my bottom right, I ended up using my X squared.

2718
01:54:05.875 --> 01:54:07.695
So I go here, I did my X squared

2719
01:54:07.835 --> 01:54:09.855
and this was my pre transformation

2720
01:54:10.395 --> 01:54:12.855
and this here was my post transformation.

2721
01:54:13.315 --> 01:54:16.415
So as you can see, I got a really high R squared also

2722
01:54:16.415 --> 01:54:17.495
means I got a really high R.

2723
01:54:18.235 --> 01:54:20.415
But the most important thing is when I plotted it,

2724
01:54:20.775 --> 01:54:23.415
I made sure to note that it has been transformed.

2725
01:54:23.475 --> 01:54:26.695
It is no longer raw data, it has been transformed.

2726
01:54:26.695 --> 01:54:28.975
So if this was age down here, let's just say it was age,

2727
01:54:29.515 --> 01:54:33.575
I'd go bracket, age bracket squared.

2728
01:54:33.835 --> 01:54:36.775
So I'd say age squared YI would say the same.

2729
01:54:36.795 --> 01:54:38.015
So let's just say, um,

2730
01:54:38.115 --> 01:54:41.855
in this case here we're talking about height, um, or weight

2731
01:54:42.155 --> 01:54:43.215
or maybe shoe size.

2732
01:54:43.555 --> 01:54:45.935
I'd say shoe size. Oh, shoe size is a categorical,

2733
01:54:45.935 --> 01:54:47.095
so I need something of numerical.

2734
01:54:47.435 --> 01:54:51.435
Um, let's just say, um, yeah, we'll say weight,

2735
01:54:51.815 --> 01:54:56.065
weight you'd put y Um, so you'd just say wait

2736
01:54:56.065 --> 01:54:57.545
because that hasn't been transformed.

2737
01:54:57.685 --> 01:55:00.825
But your age needs to be age squared

2738
01:55:00.825 --> 01:55:02.065
because it has been transformed.

2739
01:55:02.565 --> 01:55:04.305
So that is your modeling data.

2740
01:55:04.855 --> 01:55:07.065
From your modeling data, you should be able to interpret

2741
01:55:07.085 --> 01:55:09.345
and calculate line of sfi, do your transformations,

2742
01:55:09.345 --> 01:55:12.225
look at your residual plots and look at your ask squared.

2743
01:55:13.125 --> 01:55:14.585
What's really important from here is

2744
01:55:14.585 --> 01:55:16.345
that then the last topic builds upon this

2745
01:55:16.345 --> 01:55:17.625
and you go through time series.

2746
01:55:17.645 --> 01:55:19.225
Now I'm not gonna go through time series.

2747
01:55:19.555 --> 01:55:21.865
There are 30, a little bit less than that.

2748
01:55:21.865 --> 01:55:26.105
20 something slides here that are available in the PDF

2749
01:55:26.215 --> 01:55:27.905
that you'll be able to look at,

2750
01:55:27.905 --> 01:55:29.065
which will be below this video.

2751
01:55:30.455 --> 01:55:33.925
These slides cover everything else that you need to know.

2752
01:55:34.145 --> 01:55:35.925
So they cover all of your time series.

2753
01:55:36.355 --> 01:55:38.685
They cover all of how to manipulate time series.

2754
01:55:38.875 --> 01:55:40.285
This is just a very fancy way

2755
01:55:40.285 --> 01:55:41.805
of saying scatter plots with time at the bottom.

2756
01:55:42.895 --> 01:55:45.385
Nonetheless, that's everything for today.

2757
01:55:45.965 --> 01:55:47.865
Um, if you have any questions,

2758
01:55:47.925 --> 01:55:49.545
please utilize the chat in the last minute

2759
01:55:49.645 --> 01:55:50.785
or if you're not at the premier,

2760
01:55:51.005 --> 01:55:52.825
please make sure you just look through the chat.

2761
01:55:52.825 --> 01:55:53.985
There will be your an,

2762
01:55:53.985 --> 01:55:55.705
your questions will probably have already been asked

2763
01:55:55.725 --> 01:55:56.945
and have probably already been answered.

2764
01:55:57.215 --> 01:55:59.385
Otherwise, I'll be seeing you throughout the year

2765
01:55:59.385 --> 01:56:01.905
for all the other general maths uh, lectures.

2766
01:56:02.455 --> 01:56:03.985
Good luck and I'll see you then.
